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RESULT 1 
AAU96984 

ID AAU96984 standard; protein; 651 AA. 
XX 

AC AAU96984; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 protein. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 
KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 
KW chromosome 2p21. 
XX 

OS Homo sapiens. 
XX 



FH Key Location/Qualifiers 

FT Misc-dif f erence 2. .15 

FT /note= "Encoded by GGTCTC" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PAT EL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51681. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 52; Page 35-36; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5 ) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer ! s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 protein of the invention. This 

CC sequence is encoded by the human ABCG5 gene located on chromosome 2p21 

XX 

SQ Sequence 651 AA; 

Query Match 100.0%; Score 3326; DB 5; Length 651; 
Best Local Similarity 100.0%; Pred. No. 0; 

Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0 



Qy 



1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 
I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 



Db 



1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 



Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I II I II I I I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 18 0 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 121 RREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSHV 18 0 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 24 0 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

Qy 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 GL L YQ FVGAT P YT GMLNAVN L F P VL RAVS DQE S QD GL YQ KWQMMLAYALHVL P FSWATM 480 

Qy 481 I FS SVC YWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNS WALLS IA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I J I I I I I I I I I I I I I I I I I 

Db 481 I FS SVC YWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNS WALLS I A 54 0 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I II I I I I 
Db 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 2 
AAE13290 

ID AAE13290 standard; protein; 651 AA, 
XX 

AC AAE13290; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Human sitosterolaemia susceptibility gene (SSG) protein. 
XX 

KW Human; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolemia; therapy; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 2p21. 



XX 

OS Homo sapiens. 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-01984 65P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 

DR N-PSDB; AAD22009. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Claim 19; Fig 8; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is human SSG protein. Human SSG is located on chromosome 

CC 2p21 

XX 

SQ Sequence 651 AA; 

Query Match 100.0%; Score 3326; DB 5; Length 651; 
Best Local Similarity 100.0%; Pred. No. 0; 

Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I 1 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 



Qy 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 18 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 240 

I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 24 0 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

Qy 421 GL L YQ FVGAT P YT GMLNAVN L FP VL RAVS DQ E S Q D GL YQKWQMMLAYALHVL P F S WATM 480 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 GL L YQ FVGAT P YT GMLNAVN L FP VLRAVS DQE S Q D GL YQKWQMMLAYALHVL P FS WATM 480 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II 
Db 481 IFSSVCYWTLGLHPEVARFGYFSTyVLLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 3 


AAE31704 


ID 


AAE31704 standard; protein; 651 AA. 


XX 




AC 


AAE31704; 


XX 




DT 


24-MAR-2003 (first entry) 


XX 




DE 


Human ABCG5 protein. 


XX 




KW 


ABC family cholesterol transporter; ABCG8; sterol-related disorder; 


KW 


sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 


KW 


HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 


KW 


human; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 


KW 


ABCG5. 


XX 




OS . 


Homo sapiens . 


XX 




PN 


WO200281691-A2. 


XX 





PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA- ) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 

DR N-PSDB; AAD48882. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyper lipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 28; Page 78-79; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is human ABCG5 protein 
XX 

SQ Sequence 651 AA; 

Query Match 100.0%; Score 3326; DB 6; Length 651; 

Best Local Similarity 100.0%; Pred. No. 0; 

Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKWILFDEPTTGLDCMTANQIVVXLVTILAR 240 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 ADRLI GN YSLGGI STGERRRVS I AAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 



Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I 111 I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFF T VLRVRS^Aa J KGAIQDRV 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

Qy 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMIAYALHVLPFSWATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I II I I I I I I I 
Db 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II 
Db 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWTUjLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFI PALVILGIWFKIRDHLISR 651 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 
Db 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFI PALVILGIWFKIRDHLISR 651 



RESULT 4 
AAU96992 

ID AAU96992 standard; protein; 651 AA. 
XX 

AC AAU96992; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant E14 6Q protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia ; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer ! s disease; 

KW mutant ; mutein . 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif ference 146 

FT /note- "Wild-type Glu substituted by Gin" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2 001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 



XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitos terolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 12; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant E146Q protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 



Query Match 99.9%; Score 3323; DB 5; Length 651; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 650; Conservative 1; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQI LKDVSL YVES GQIMCI LGS S GS GKTTLLDAMS GRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSWLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQKKVEAVMAELSLSHV 180 

I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSYVXQSDTLLSSLTVRQTLHYTALLAI RRGNPGSFQKKVEIAVMAELSLSHV 180 



Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELAR 24 0 



Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTS VDTQSKEREI ETSKRVQMI ESAYKKSAI CHKTLKNI ERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I II I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREI ETSKRVQMI ESAYKKSAI CHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I M I I I I I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

Qy 421 GL L YQ FVGAT P YT GMLNAVN L F P VL RAVS DQ E S Q D GL YQKWQMMLAYALHVL P FS WATM 48 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I 
Db 421 GLL YQ FVGAT P YTGMLNAWLFPVL RAVS DQESQDGLYQKWQMMLAYALHVLPFSVVATM 480 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 54 0 

I I I I I II I I I I I I I I II I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I 
Db 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 54 0 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I II I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFI EKTCPGATSRFTMNFLILYS FI PALVI LGI WFKIRDHLI SR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II I 
Db 601 AFTQGIQFI EKTCPGATSRFTMNFLI LYSFI PALVI LGI WFKIRDHLI SR 651 



RESULT 5 
AAU96990 

ID AAU96990 standard; protein; 651 AA. 
XX 

AC AAU96990; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R389H protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif ference 389 

FT /note= "Wild-type Arg substituted by His" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 



XX 

PA (USSH-) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 7; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer ! s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R389H protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 



Query Match 99.8%; Score 3321; DB 5; Length 651; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 650; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLWESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 



Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I II I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLT S VDTQSKEREI ET S KRVQMI ESAYKKSAI CHKTLKN I ERMKHLKTLPMVP FKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

I I I I I I I ! I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I II I I II I I I I I I I I I I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITHLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

Qy 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

Qy 481 IFSSVCYWTLGLHPEV7VRFGYFSAALIAPHLIGEFLTLVLLGIVQNPNIWSWALLSIA 540 

I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 AFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 



RESULT 6 
AAU96989 

ID AAU96989 standard; protein; 651 AA. 
XX 

AC AAU96989; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R419H protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease; 

KW mutant ; mutein . 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif ference 419 

FT /note= "Wild-type Arg substituted by His" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 



XX 

PF 25-SEP-2001; 2001WO-US02 9859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PAT EL SB. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 9; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419H protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 



Query Match 99.8%; Score 3321; DB 5; Length 651; 

Best Local Similarity 99.8%; Pred. No. 0; 

Matches 650; Conservative 0; Mismatches 1; Indels 0; Gaps 0 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 



Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 12 0 



Qy 121 RREQFQDCFS YVLQSDTLLS S LTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSHV 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFS YVLQSDTLLS SLTVRETLHYTAL LAI RRGNPGS FQKKVEAVMAELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

I I I I I I I I I I I I I I I I I I I I I I I I i I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 ADRLI GNYSLGGI STGERRRVS IAAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELAR 24 0 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDHV 420 

Qy 421 GL L YQ FVGAT P YT GMLNAVNL F P VL RAVS DQ E S QD GL YQ KWQMMLAYALHVL P F S WATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 GLLYQFVGATPYT GMLNAVNL FPVLRAVS DQESQDGLYQ KWQMMLAYALHVL PFS WATM 480 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 



Qy 


601 AFTQGIQFIEKTCPGATSRFTMNFLI LYS FI PAL VI LGI WFKI RDHLI SR 


651 




1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 II 




Db 


601 AFTQGIQFI EKTCPGATSRFTMNFLI LYSFI PAL VI LGI WFKI RDHLI SR 


651 


RESULT 7 




AAU96993 




ID 


AAU96993 standard; protein; 651 AA. 




XX 






AC 


AAU96993; 




XX 






DT 


30-JUL-2002 (first entry) 




XX 






DE 


Human ABCG5 mutant R419P protein sequence. 




XX 






KW 


Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 


KW 


arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's 


disease; 


KW 


mutant; mutein. 




XX 






OS 


Homo sapiens. 




OS 


Synthetic. 




XX 






FH 


Key Location/Qualifiers 




FT 


Misc-dif f erence 419 




FT 


/note= "Wild-type Arg substituted by Pro" 





XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES . 

PA (PATE/) PAT EL SB. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 10; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer 1 s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R419P protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 651 AA; 

Query Match 99.8%; Score 3319; DB 5; Length 651; 
Best Local Similarity 99.8%; Pred. No. 0; 

Matches 650; Conservative 0; Mismatches 1; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 



Qy 61 RQQWTRQILKDVSLWESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 12 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 12 0 

Qy 121 RREQ FQDC FS YVLQS DT LL S S LT VRET LH YTALLAI RRGN P GS FQKKVEAVMAEL S LS HV 18 0 

I I 11 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSWLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQKKVEAVM7VELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 24 0 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I II II I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLT SVDTQS KEREI ETS KRVQMI ESAYKKS AI CHKTLKNI ERMKHLKTLPMVP FKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 361 SPGVFSKLGVLLRRVTRNLAARNKIiAVITRLLQNLIMGLFLLFFVLRVRS^AyXKGAIQDPV 420 

Qy 421 GLL YQ FVGAT P YT GMLNAVN L FP VL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P FS WATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

Qy 481 I FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI WSVVALLS IA 54 0 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 I FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNIVNS WALLS I A 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFI PALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFI PALVILGIWFKIRDHLISR 651 



RESULT 8 

ABP52128 

ID ABP52128 standard; protein; 649 AA. 
XX 

AC ABP52128; 
XX 

DT 10-OCT-2002 (first entry) 
XX 

DE Homo sapiens ABC transporter ABCG5 protein SEQ ID NO: 80. 
XX 

KW ATP-binding cassette transporter; ABC transporter; modulation; D loop; 

KW cancer; bacterial infection; fungal infection; protozoal infection; 

KW antibacterial; fungicide; protozoacide . 
XX 

OS Homo sapiens. 
XX 



PN EP1217066-A1. 
XX 

PD 26-JUN-2002. 
XX 

PF 21-DEC-2000; 2000EP-00870316 . 
XX 

PR 21-DEC-2000; 2000EP-00870316 . 
XX 

PA (UYGE-) UNIV GENT . 
XX 

DR WPI; 2002-550404/59. 
XX 

PT Modulating activity of ATP-binding cassette (ABC) transporters by 

PT influencing dimerization of nucleotide binding domains through use of D 

PT loop sequence of an ABC transporter, or its antisense peptide or peptide 

PT mimetic. 

XX 

PS Disclosure; Fig 3; 2 90pp; English. 
XX 

CC The present invention describes a method (Ml) for modulating the activity 

CC of ATP-binding cassette (ABC) transporters by influencing the 

CC dimerisation of the nucleotide binding domains comprises using: (a) a 

CC polypeptide (polyp) consisting of 5-50 amino acids comprising the D loop 

CC sequence of an ABC transporter (ABP52049 to ABP52091) ; (b) a polyP 

CC consisting of the D loop sequence of an ABC transporter; (c) a peptide 

CC mimetic or antisense peptide of (a) or (b) . ABC transporters have 

CC antibacterial, fungicide and protozoacide activities. (Ml) is useful for 

CC selectively modulating the activity of ABC transporters belonging to the 

CC group of multidrug transporter/P-glycoproteins . Bacterial, fungal or 

CC protozoal ABC transporters are involved in the infection of a mammal or 

CC in the induction of resistance to antibiotics or drugs in a mammal. (Ml) 

CC is useful for preventing, treating or alleviating diseases associated 

CC with functionality of an ABC transporter. ABP52092 to ABP52140 represent 

CC ABC transporter proteins given in the exemplification of the present 

CC invention 

XX 

SQ Sequence 64 9 AA; 

Query Match 99.3%; Score 3304; DB 5; Length 649; 
Best Local Similarity 99.7%; Pred. No. 0; 

Matches 649; Conservative 0; Mismatches 0; Indels 2; Gaps 1 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I II I I I I I I I I I I I I I I I I I I I I 

Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSWLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQKKV^ViyiAELSLSHV 180 

I I I I I I II I I I I I I I I I I I I I I I I I II I M I I I I I I I II II I I I I I I I I I I I I I I I I I I I 

Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

Qy 181 7VDRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVXLVELAR 240 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 

Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLF — PTTGLDCMTANQIWLLVELAR 238 



Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 239 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 298 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 299 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 358 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 359 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 418 

Qy 421 GL L YQ FVGAT P YT GMLN AVN L F PVL RAVS DQ E S Q D G L YQ KWQMMLAYALHVL P FS WATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 419 GLL YQ FVGAT P YT GMLN AVN L F PVL RAVS DQ E S Q D GL YQ KWQMMLAYALHVL P FS WATM 47 8 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 
Db 479 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 538 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 539 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 598 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 599 AFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PAL VI LGI WFKI RDHLI SR 649 



RESULT 9 
AAE13309 

ID AAE13309 standard; protein; 652 AA. 
XX 

AC AAE13309; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein variant #2. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; mutein; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; mutant; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; therapy; variant. 
XX 

OS Mus sp. 

OS Synthetic. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif f erence 28 

FT /note= "Wild type Gly substituted with Ala" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 



XX 

PR 18-APR-2000; 2 000US-01984 65P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Disclosure; Page; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 

CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein variant obtained by replacing Gly28 

CC with Ala. Note: The present sequence is not shown in the specification 

CC but is derived from mouse SSG protein referred as SEQ ID NO: 1 (AAE13289) 

CC and shown in figure 7 of the specification 

XX 

SQ Sequence 652 AA; 

Query Match 82.6%; Score 2748.5; DB 5; Length 652; 
Best Local Similarity 80.4%; Pred. No. 2.4e-280; 

Matches 524; Conservative 64; Mismatches 63; Indels 1; Gaps 1 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

I I : I I : I I : I : I I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHINRGSLSSLEQASVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLWESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVTWGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II 111:111 

Db 61 CQQKWDRQI LKDVSLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 120 

Qy 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I : : I I I I I I I I I I I I I 

Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

Qy 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

I I I : : I I : I : I I I I : I I I I I I II I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 

Db 181 VADQMIGSYNFGGISSGERRRVSI7\AQLLQDPKVMMLDEPTTGLDCMT7\NQIVLLLAELA 240 



Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : M I I I I I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I II I I i I I I I I I : I I I I I I 11111:1 I : I : I I II I : I I I I :: I I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRS^AALKGAIQDR 419 

I I I : I II I I I I I I I I I I I : I I I III I I : I II I I I I I I : I : : I I I : : I I I I I : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 420 VGL L YQ FVGAT P YT GMLN AVN L F P VLRAVS DQ E S Q D GL YQ KWQMMLAYALHVL P F SWAT 47 9 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I 11111111:11 

Db 421 VGLLYQLVGATPYTGMLNAWLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 4 80 

Qy 480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 

: I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
Db 481 VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI WS IVALLS I 54 0 

Qy 54 0 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILVVNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I II I II I : : I I 
Db 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

Qy 600 CAFTQGIQFI EKTCPGATSRFTMNFLI LYSFI PALVI LGI WFKIRDHLI SR 651 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 

Db 601 CAI TQGVQ FI EKTCP GAT SRFTANFLILYGFI PALVI LGI VI FKVRDYLISR 652 



RESULT 10 
AAE13289 

ID AAE13289 standard; protein; 652 AA. 
XX 

AC AAE13289; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolemia; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; chromosome 17; therapy. 

XX 

OS Mus sp. 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 



XX 

PI 

XX 
DR 
DR 
XX 
PT 
PT 
PT 
XX 
PS 
XX 

cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 
cc 

XX 
SQ 



Tian H, Schultz J, Shan B; 

WPI; 2002-017598/02. 
N-PSDB; AAD22008. 

Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 
useful for screening a compound that increases the level of expression or 
activity of SSG polypeptide for treating sterol-related disorder. 

Claim 19; Fig 7; 105pp; English. 

The invention relates to an isolated Sitosterolaemia Susceptibility Gene 
(SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 
binding cassette (ABC) family cholesterol transporter. SSG is useful for 
identifying a compound useful in the treatment or prevention of a sterol- 
related disorder, including sitosterolaemia, hyperlipidaemia, 
hypercholesterolaemia, gall stones, HDL deficiency, atherosclerosis or 
nutritional deficiencies. SSG is also useful for treating cholesterol- 
associated diseases or conditions including coronary heart disease and 
other cardiovascular diseases, and sitosterolaemia-associated condition 
including arthritis, xanthomas and chronic haemolytic anaemia. SSG 
expression cassette is useful in the production of transgenic non-human 
animals. SSG genes and their homologues are useful as tools for a number 
of applications including diagnosing sitosterolaemia and other 
cardiovascular disorders, for forensics and paternity determinations, and 
for treating any of a large number of SSG associated diseases. The 
present sequence is mouse SSG protein. Mouse SSG is located on chromosome 
17 

Sequence 652 AA; 



Query Match 82.5%; Score 2744.5; DB 5; 

Best Local Similarity 80.2%; Pred. No. 6.3e-280; 
Matches 523; Conservative 64; Mismatches 64; 



Length 652; 
Indels 1; 



Gaps 



l; 



Qy 

Db 



1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

I I : I I : I I : I Mill I I I I I I I I I I : I I I I I I I : I I I I I : I I 

1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 



Qy 

Db 



60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 
I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 

61 CQQKWDRQI LKDVS LYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 120 



Qy 

Db 

Qy 

Db 

Qy 

Db 



120 LRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSH 179 
I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I : : I I I I I I I I I I I I I 

121 LRRDQFQDCFSWLQSDVFLSSLTVRETLRYT7\MLALCRSSADFYNKKVEAVMTELSLSH 180 

180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 
I I I : : II : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 

181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKViyiMLDEPTTGLDCMTANQIVLLLAELA 240 

240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I :: I I I I I I I I I I I I I I I I I I : : I M : I I I I I III I I I : I I I I I I I I I I I I I 

241 RRDRIVIVTIHQPRSELFQHFDKIAI LTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 



Qy 



300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 



I 1111:111 11:1 hhl I I I I = I I I I ■ ■ Illllll 

Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I I I : I I I I I I I I I I I I I I : I I I III I I : I M I I I I I I : I : : I I I : : I I I I I : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 420 VGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 1111:111 I I I I I I I I : I I 

Db 421 VGLLYQLVGATPYTGMLNAWLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

Qy 480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIWSWALLSI 539 

: II I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I II I I I I I : I I I I I I 
Db 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

Qy 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I I I I I II I I : I I I I I I I I I I I I I II I I I I I II I I II I : : I I 
Db 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

Qy 600 CAFTQGI QFI EKTCPGAT SRFTMN FLI LYS FI PALVI LGI WFKI RDHLI S R 651 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 

Db 601 CAITQGVQFI EKTCPGAT SRFTANFLILYGFI PALVI LGI VI FKVRDYLISR 652 



RESULT 11 
AAE31702 

ID AAE31702 standard; protein; 652 AA. 
XX 

AC AAE31702; 
XX 

DT 24-MAR-2003 (first entry) 
XX 

DE Mouse ABCG5 protein. 
XX 

KW ABC family cholesterol transporter; ABCG8 ; sterol-related disorder; 

KW sitosterolaemia; hyperlipidaemia; hypercholesterolaemia; gall stone; 

KW HDL deficiency; atherosclerosis; nutritional deficiency; gene therapy; 

KW mouse; ATP-binding cassette; sitosterolaemia susceptibility gene; SSG; 

KW ABCG5 . 
XX 

OS Mus sp. 
XX 

PN WO200281691-A2. 
XX 

PD 17-OCT-2002. 
XX 

PF 20-NOV-2001; 2001WO-US043823 . 
XX 

PR 20-NOV-2000; 2000US-0252235P . 

PR 28-NOV-2000; 2000US-0253645P . 
XX 

PA (TULA-) TULARIK INC. 

PA (TEXA ) UNIV TEXAS SYSTEM. 

XX 

PI Hobbs HH, Shan B, Barnes R, Tian H; 
XX 

DR WPI; 2003-058548/05. 



DR N-PSDB; AAD48880. 
XX 

PT New ABCG8 polypeptides and nucleic acids, useful for treating sterol- 

PT related disorders e.g. sitosterolemia, hypercholesterolemia, 

PT hyperlipidemia, gall stones, HDL deficiency, atherosclerosis, or 

PT nutritional deficiencies. 

XX 

PS Claim 28; Page 74; 94pp; English. 
XX 

CC The invention relates to ATP-binding cassette (ABC) family cholesterol 

CC transporter, ABCG8 polypeptides and polynucleotides. The invention also 

CC provides ABCG5 polypeptides and polynucleotides. ABCG5 gene is also known 

CC as sitosterolaemia susceptibility gene (SSG) . Sequences of the invention 

CC are useful for treating or preventing sterol-related disorders such as 

CC sitosterolaemia, hyperlipidaemia, hypercholesterolemia, gall stones, HDL 

CC deficiency, atherosclerosis and nutritional deficiencies. They are also 

CC useful in gene therapy. The present sequence is mouse ABCG5 protein 
XX 

SQ Sequence 652 AA; 



Query Match 82.5%; Score 2744.5; DB 6; Length 652; 

Best Local Similarity 80.2%; Pred. No. 6.3e-280; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVXHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I I I I I II I I I : I I I I 1 I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 

Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 12 0 LRREQFQDCFS WLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 17 9 

I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I : : I I I I I I I I I I I I I 
Db 121 LRRDQFQDCFSWLQSDVFLSSLTVRETLRYTAMLAliCRSSADFYNKKVEAVMTELSLSH 18 0 

Qy 18 0 VADRLI GNYSLGGI STGERRRVS I AAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELA 239 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 
Db 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKA/MMLDEPTTGLDCMTANQIVLLLAELA 240 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I : : I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I I I I I I I I I I I : I I : I : I I II I : I I II : : I I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I II : I I I I I I I I II I I I I : I I I III I I : I I I I I I I I I : I : : I I I :: I I I I I : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 42 0 VGLLYQ FVGAT P YTGMLNAVNLFPVLRAVS DQESQDGLYQKWQMMLAYALHVLP FS WAT 479 

I I I I I I I I II I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I 
Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 



Qy 



480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 



Db 



: I I I I I I I I I I I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
4 81 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 



Qy 



540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 





Db 



541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 



Qy 



600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 





Db 



601 CAITQGVQFI EKTCPGATSRFTANFLI LYGFI PAL VI LGIVI FKVRDYLI SR 652 



RESULT 12 
AAE13308 

ID AAE13308 standard; protein; 652 AA. 
XX 

AC AAE13308; 
XX 

DT 12-FEB-2002 (first entry) 
XX 

DE Mouse sitosterolaemia susceptibility gene (SSG) protein variant #1. 
XX 

KW Mouse; sitosterolaemia susceptibility gene; SSG; atherosclerosis; mutein; 

KW sterol-related disorder; hyperlipidaemia; hypercholesterolaemia; mutant; 

KW gall stone; coronary heart disease; cardiovascular disease; arthritis; 

KW xanthoma; haemolytic anaemia; transgenic animal; therapy; variant. 
XX 

OS Mus sp. 

OS Synthetic. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif ference 17 

FT /note= "Wild type lie substituted with Leu" 
XX 

PN WO200179272-A2. 
XX 

PD 25-OCT-2001. 
XX 

PF 18-APR-2001; 2 001WO-US012758 . 
XX 

PR 18-APR-2000; 2000US-0198465P . 

PR 15-MAY-2000; 2000US-0204234P . 
XX 

PA (TULA-) TULARIK INC. 
XX 

PI Tian H, Schultz J, Shan B; 
XX 

DR WPI; 2002-017598/02. 
XX 

PT Novel sitosterolemia susceptibility gene polypeptide and polynucleotide, 

PT useful for screening a compound that increases the level of expression or 

PT activity of SSG polypeptide for treating sterol-related disorder. 
XX 

PS Disclosure; Page; 105pp; English. 
XX 

CC The invention relates to an isolated Sitosterolaemia Susceptibility Gene 

CC (SSG) polypeptide. SSG is a member of adenosine triphosphate (ATP) 



CC binding cassette (ABC) family cholesterol transporter. SSG is useful for 

CC identifying a compound useful in the treatment or prevention of a sterol- 

CC related disorder, including sitosterolaemia, hyperlipidaemia, 

CC hypercholesterolemia, gall stones, HDL deficiency, atherosclerosis or 

CC nutritional deficiencies. SSG is also useful for treating cholesterol- 

CC associated diseases or conditions including coronary heart disease and 

CC other cardiovascular diseases, and sitosterolaemia-associated condition 

CC including arthritis, xanthomas and chronic haemolytic anaemia. SSG 

CC expression cassette is useful in the production of transgenic non-human- 

CC animals. SSG genes and their homologues are useful as tools for a number 

CC of applications including diagnosing sitosterolaemia and other 

CC cardiovascular disorders, for forensics and paternity determinations, and 

CC for treating any of a large number of SSG associated diseases. The 

CC present sequence is mouse SSG protein variant obtained by replacing Ilel7 

CC with Leu. Note: The present sequence is not shown in the specification 

CC but is derived from mouse SSG protein referred as SEQ ID NO: 1 (AAE13289) 

CC and shown in figure 7 of the specification 

XX 

SQ Sequence 652 AA; 



Query Match 82.5%; Score 2742.5; DB 5; Length 652; 

Best Local Similarity 80.2%; Pred. No. le-279; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1 

Qy 1 MGDLSSLTPGGSMGLQWRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

11:1 I : I I : I : I I I I I I I I I I I I I I : I I I I I I i : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHLNRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I II I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 12 0 

Qy 120 LRREQFQDCFS YVLQS DTLLS SLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

I I I : I II I I I I I I I I I I I I I I I I I I I I llhll: I : : I I I I I I I I I I I I I 

Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMIALCRSSADFYNKKVT^VMTELSLSH 180 

Qy 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I II I I I I I I I I I I I I I : I I III 
Db 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVM^LDEPTTGLDCMTANQIVLLLAELA 240 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : I I I I II I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I II I I I : I I I I I I I I I I I : I I : I : I I II I : I I I I :: I I II I I M I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRWRNLVWKIAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I I I : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I II : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 420 VGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

MINI I I I I I I I I I I I I I I I I I : I I I I I I I I I I II I I I I I I : I I I I I II I I I I : I I 
Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 



Qy 



480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 




Db 



481 



VI FSSVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNS I VALLS I 54 0 



Qy 



540 



AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 





Db 



541 



SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 



Qy 



600 



CAFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKIRDHLI SR 651 





Db 



601 



CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 



RESULT 13 
AAU96985 

ID AAU96985 standard; protein; 652 AA. 
XX 

AC AAU96985; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Mouse ABCG5 protein. 
XX 

KW Mouse; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hyper sterolemia; Alzheimer's disease. 
XX 

OS Mus sp. 
XX 

FH Key Location/Qualif iers 

FT Misc-dif ference 638. .652 

FT /note= "Encoded by CTAG " 

XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51684. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 42; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 



CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer ! s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the mouse ABCG5 protein of the invention 

XX 

SQ Sequence 652 AA; 

Query Match 82.3%; Score 2738.5; DB 5; Length 652; 

Best Local Similarity 80.1%; Pred. No. 2.7e-279; 

Matches 522; Conservative 64; Mismatches 65; Indels 1; Gaps 1; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

11:1 I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 12 0 LRREQFQDCFSYVXQSDTLLSSLTVRETLHYTALIAIRRGNPGSFQKKVT^VMAELSLSH 179 

I I I : I I I I M I I I I I I I I I I I I I I I I I I I I : I I : I : : I I I I I I I I I I I I I 

Db 121 LRRDQFQDCFSYVXQSDVFLSSLTVRETLRYTAMIALCRSS7VDFYNKKVEAVMTELSLSH 180 

Qy 18 0 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I = I I I I I 
Db 181 VADQMIGSYNFGGISSGERRRVSIT^QLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : I II I I I I I I I I II 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I II I I I I I I I I : I I : I : I I II I : I I I I : : I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPTVPFKTK 360 

Qy 360 DSPGVFSKLGVXLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I I I : I I I I I I I I I I I I II : I II III I I : I I I I I I I I I : I : : I I I :: I I I II : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 420 VG L L YQ FVGAT P YT GMLNAVNL F P VL RAVS DQE S QDGL YQ KWQMMLAYALHVL P F S WAT 479 

I I I I I I I I I II II I I I I I I I I I I : I I I II I I II I I I I I lllhlll I I I I I I II : I I 
Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 



QY 



480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 



Db 



: I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 



QY 



540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 





Db 



541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 



QY 



600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 





Db 



601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 



RESULT 14 
AAU96986 

ID AAU96986 standard; protein; 652 AA. 
XX 

AC AAU96986; 
XX 

DT 07-AUG-2003 (revised) 

DT 30-JUL-2002 (first entry) 

XX 

DE Rat ABCG5 protein. 
XX 

KW Rat; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypers terolemia; Alzheimer's disease. 
XX 

OS Rattus sp. 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 

DR N-PSDB; ABK51686. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Example 3; Page 45; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 

CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 



CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer's 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the rat ABCG5 protein of the invention. (Updated 

CC on 07-AUG-2003 to correct OS field.) 

XX 

SQ Sequence 652 AA; 



Query Match 82.0%; Score 2727.5; DB 5; Length 652; 

Best Local Similarity 79.4%; Pred. No. 3.9e-278; 

Matches 518; Conservative 68; Mismatches 65; Indels 1; Gaps 1 

Qy 1 MGDLS SLT PGGSMGLQVNRGSQS S LEGAPATAPEP-HS LGI LHAS YS VSHRVRPWWDI T S 59 

I I : I I : I I : I I I I I I I I I I I I I I I I : I : I : I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHNNRGSQSSLEEGSVTGSEARHSLGVLNVSFSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I : I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I II 
Db 61 CQQKWDRKILKDVSLYIESGQTMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 120 LRREQFQDCFSWLQSDTLLSSLTWETLHYTALIJUiRRGNPGSFQKKVEAVMAELSLSH 179 

I I I : I I I I I IIMIII I I I I I I I I I I I I I : I I : I : : MINI: I I I I I I 
Db 121 LRRDQFQDCVSYLLQSDVFLSSLTVRETLRYTAMLALRSSSADFYDKKVEAVLTELSLSH 180 

Qy 180 VADRLIGNYSLGGISTGERRRVSI7\AQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

I I I : : I I I I : I I M : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I I I II 
Db 181 VADQMIGNYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANHIVLLLVELA 240 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 2 99 

I I I I I I :: I I I I I I I I I I I I I I I I I :: I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 241 RRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREI ETSKRVQMI ESAYKKSAI CHKTLKNI ERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I I I I I I I I II I : I I I : : : I I I I I I : I I II : I I I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREI ETYKRVQMLESAFRQSDI CHKI LENI ERTRHLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVXLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

: I I : I I I I I I I I I I I I I I : I I I II I I : I I I I I I I I I : I : : I I I : : I : I I I I : I I I 
Db 361 NPPGMFCKLGVLLRRVTRNLMRNKQWIMRLVQNLIMGLFLIFYLLRVQNNMLKGAVQDR 420 

Qy 420 VGLLYQFVGATPYTGMLNAWLFPVXRAVSDQESQDGLYQKWQM 479 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I : I I I II I I I I : I I I 
Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYQKWQMLLAYVLHALPFSIVAT 480 

Qy • 4 80 MIFSSVCYWTLGLHPEVARFGYFS7UUJLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 

: I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I II I : I I I I I I 
Db 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGMVQNPNIVNSIVALLSI 540 



Qy 54 0 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I : I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II II III 
Db 541 SGLLIGSGFIRNIEEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSVPNNPM 600 

Qy 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I : II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I : I I : I I I I 
Db 601 CSMTQGIQFIEKTCPGATSRFTTNFLILYSFIPTLVILGMWFKVRDYLISR 652 



RESULT 15 
AAU96991 

ID AAU96991 standard; protein; 408 AA. 
XX 

AC AAU96991; 
XX 

DT 30-JUL-2002 (first entry) 
XX 

DE Human ABCG5 mutant R408X protein sequence. 
XX 

KW Human; ABCG5; ATP-binding cassette gene 5; sitosterolemia; cholesterol; 

KW arteriosclerosis; heart disease; hypersterolemia; Alzheimer's disease; 

KW mutant; mutein. 
XX 

OS Homo sapiens. 

OS Synthetic. 
XX 

FH Key Location/Qualifiers 

FT Misc-dif ference 4 08 

FT /note= "Wild-type protein truncated at this position" 
XX 

PN WO200227016-A2. 
XX 

PD 04-APR-2002. 
XX 

PF 25-SEP-2001; 2001WO-US029859 . 
XX 

PR 25-SEP-2000; 2000US-0235268P . 
XX 

PA (USSH ) US DEPT HEALTH & HUMAN SERVICES. 

PA (PATE/) PATEL S B. 

PA (DEAN/) DEAN M. 
XX 

PI Patel SB, Dean M; 
XX 

DR WPI; 2002-416483/44. 
XX 

PT Novel mammalian ATP-binding cassette gene 5 polypeptide, and the nucleic 

PT acid encoding the polypeptide, useful for treating sitosterolemia, 

PT arteriosclerosis and heart diseases. 
XX 

PS Claim 10; Page; 66pp; English. 
XX 

CC The present invention relates to a new mammalian ATP-binding cassette 

CC gene 5 (ABCG5) polypeptide. The invention is useful for identifying a 

CC predisposition for developing sitosterolemia, arteriosclerosis or heart 

CC disease. The molecules of the invention are also useful for identifying a 



CC compound which alters ABCG5 activity level comprising contacting a cell 

CC culture or mammal which have ABCG5 polypeptide with a compound and 

CC measuring ABCG5 biological activity in the cell culture or in mammal, 

CC where an increase or decrease in ABCG5 biological activity compared to 

CC ABCG5 biological activity in a control cell culture or mammal not 

CC contacted with the compound, identifies a compound that increases or 

CC decreases ABCG5 activity respectively. The cell culture or mammal 

CC comprises a mutated ABCG5 polypeptide or a wild type polypeptide. The 

CC ABCG5 biological activity, or level of ABCG5 mRNA, or level of the 

CC polypeptide in a cell culture or mammal is also compared with that of a 

CC second cell culture or mammal comprising a wild type ABCG5 polypeptide. 

CC Stimulation of ABCG5 activity is useful for treating or preventing 

CC hypersterolemia, arteriosclerosis, heart disease and/or Alzheimer 1 s 

CC disease. The method of the invention is useful for increasing cholesterol 

CC excretion and/or decreasing cholesterol adsorption. The present amino 

CC acid sequence represents the human ABCG5 mutant R4 08X protein of the 

CC invention. Note: This sequence is not shown in the specification but is 

CC derived from the wild-type human ABCG5 protein (AAU96984) given on pages 

CC 35-36 of the specification 

XX 

SQ Sequence 408 AA; 

Query Match 62.6%; Score 2081; DB 5; Length 408; 

Best Local Similarity 100.0%; Pred. No. 3.4e-210; 

Matches 4 08; Conservative 0; Mismatches 0; Indels 0; Gaps 0 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWT RQ I LKDVS L YVE S GQ I MC I LGS S G S GKTT LLDAMS GRLGRAGT FLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWT RQ I LKDVS L YVE S GQ I MC I LGS S GS GKTT LLDAMS GRLGRAGT FLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I II I I 
Db 181 ADRLI GNYSLGGI STGERRRVS IAAQLLQDPKVMLFDEPTTGLDCMTANQI WLLVELAR 24 0 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRI WLTIHQPRSELFQLFDKIAILS FGELI FCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I M I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVR 408 

I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVR 408 



Search completed: February 27, 2004, 06:44:21 
Job time : 48.4649 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on: 



Title: 

Perfect score: 
Sequence : 

Scoring table: 



February 27, 2004, 07:11:48 ; Search time 14.7508 Seconds 

(without alignments) 
2278.426 Million cell updates/sec 

US-09-989-981A-6 
3326 

1 MGDL S S LT PGGSMGLQVNRG PAL VI LGI WFKI RDHLI SR 651 

BLOSUM62 

Gapop 10.0 , Gapext 0.5 



389414 



Searched: 389414 seqs, 51625971 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 

Database : Issued_Patents_AA: * 

1: /cgn2_6/ptodata/2/iaa/5A_COMB.pep:* 

2: /cgn2_6/ptodata/2/iaa/5B_COMB.pep:* 

3 : /cgn2_6/ptodata/2/iaa/6A_COMB . pep : * 

4 : /cgn2_6/ptodata/2/iaa/6B_COMB . pep : * 

5 : /cgn2_6/ptodata/2/iaa/PCTUS_COMB . pep : * 

6 : / cgn2_6/ptodata/2/ iaa/backf ilesl . pep : * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 

SUMMARIES 



Result 
No. 


Score 


Query 
Match 


Length 


DB 


ID 








Description 


1 


682.5 


20.5 


655 


4 


US- 


09- 


245- 


808-1 


Sequence 


1, Appli 


2 


674.5 


20.3 


655 


4 


US- 


09- 


767- 


594-1 


Sequence 


1, Appli 


3 


436.5 


13.1 


1296 


4 


US- 


09- 


614- 


912-140 


Sequence 


140, App 


4 


373.5 


11.2 


617 


4 


US- 


09- 


614- 


912-138 


Sequence 


138, App 


5 


334.5 


10.1 


539 


4 


US- 


09- 


614- 


912-144 


Sequence 


144, App 


6 


263.5 


7.9 


653 


4 


US- 


09- 


543- 


681A-5411 


Sequence 


5411, Ap 


7 


259 


7.8 


384 


4 


US- 


09- 


489- 


039A-9127 


Sequence 


9127, Ap 


8 


258.5 


7.8 


210 


4 


US- 


09- 


543- 


681A-8215 


Sequence 


8215, Ap 


9 


253.5 


7.6 


373 


4 


US- 


09- 


543- 


681A-7638 


Sequence 


7638, Ap 


10 


249.5 


7.5 


245 


4 


US- 


09- 


540- 


236-3618 


Sequence 


3618, Ap 


11 


245 


7.4 


344 


4 


US- 


09- 


489- 


039A-13987 


Sequence 


13987, A 



12 


244 


7. 


3 


248 


4 


us- 


■09- 


134- 


■001C-3731 


Sequence 


3731, Ap 


13 


244 


7. 


3 


1280 


2 


US- 


-08- 


752- 


•447-2 


Sequence 


2, Appli 


14 


244 


7. 


3 


1280 


4 


us- 


09- 


316- 


-167-2 


Sequence 


2, Appli 


15 


244 


7. 


3 


1280 


4 


us- 


09- 


397- 


-233-2 


Sequence 


2, Appli 


16 


243.5 


7. 


3 


276 


4 


us- 


09- 


489- 


-039A-13021 


Sequence 


13021, A 


17 


240 


7. 


2 


1279 


2 


us- 


08- 


784- 


64 9A-2 


Sequence 


2, Appli 


18 


240 


7. 


2 


1279 


4 


US- 


09- 


672- 


810-6 


Sequence 


6, Appli 


19 


240 


7. 


2 


1280 


2 


us- 


08- 


583- 


276-19 


Sequence 


19, Appl 


20 


240 


7. 


2 


1280 


4 


us- 


09- 


767- 


594-2 


Sequence 


2, Appli 


21 


240 


7. 


2 


1280 


4 


us- 


09- 


672- 


810-5 


Sequence 


5, Appli 


22 


240 


7. 


2 


1280 


6 


5206352-4 




Patent No. 


5206352 


23 


239.5 


7. 


2 


358 


4 


US- 


09- 


489- 


039A-7399 


Sequence 


7399, Ap 


24 


239.5 


7. 


2 


1684 


3 


us- 


08- 


665- 


259-25 


Sequence 


25, Appl 


25 


239.5 


7. 


2 


1684 


3 


us- 


08- 


762- 


500-25 


Sequence 


25, Appl 


26 


239.5 


7. 


2 


1704 


3 


us- 


08- 


762- 


500-75 


Sequence 


75, Appl 


27 


239 


7. 


2 


229 


4 


us- 


09- 


134- 


000C-3584 


Sequence 


3584, Ap 


28 


237 


7. 


1 


254 


4 


us- 


09- 


489- 


039A-13102 


Sequence 


13102, A 


29 


236. 5 


7. 


1 


329 


4 


us- 


09- 


107- 


532A-4844 


Sequence 


4844, Ap 


30 


234.5 


7. 


1 


266 


4 


us- 


09- 


252- 


991A-26488 


Sequence 


26488, A 


31 


232.5 


7. 
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ALIGNMENTS 



RESULT 1 
US-09-245-808-1 

; Sequence 1, Application US/09245808 

; Patent No. 6313277 

; GENERAL INFORMATION: 

; APPLICANT: Doyle, L. Austin 

; APPLICANT: Abruzzo, Lynne V. 

; APPLICANT: Ross, Douglas D. 

; TITLE OF INVENTION: Breast Cancer Resistance Protein (BCRP) and DNA which 

; TITLE OF INVENTION: encodes it 

; FILE REFERENCE: Ross UMb conversion 

; CURRENT APPLICATION NUMBER: US/09/245, 808 

; CURRENT FILING DATE: 1999-02-05 

; EARLIER APPLICATION NUMBER: 60/073763 

; EARLIER FILING DATE: 1998-02-05 

; NUMBER OF SEQ ID NOS : 7 

; SOFTWARE: Patent In Ver. 2.0 

; SEQ ID NO 1 



LENGTH: 655 
TYPE: PRT 

ORGANISM: Human MCF-7/AdrVp cells 
US-09-245-808-1 

Query Match 2 0.5%; Score 682.5; DB 4; Length 655; 

Best Local Similarity 29.2%; Pred. No. 2.9e-64; 

Matches 182; Conservative 138; Mismatches 249; Indels 55; Gaps 18; 

Qy 21 SQSSLEGAPATAP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I II I I : : I :::::: I I : I I : : : I I : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCI LGS SGS GKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFS YVLQSDT 137 

I : I I I : I I I : : I I I : : I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVRETLHYTALLAIRRGNPG-SFQKKV^VMAELSLSHVADRLIGNYSLGGISTG 196 

: : : I I I I I I : : I I : : : : h II I III : I : I : I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELARRNRIVVLTIHQPRSEL 256 

II : I II : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : M I : : I : I I : I I I I :| III : : I I I I : : I : : I : :: 
Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMI ESAYKKSAI CHKT LKNIERMKHLKTLPMVPF 356 

I : II II: :: : I I : : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKIAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : :::|| : ::| I: I 
Db 370 TT S FCHQLRWVSKRS FKNLLGNPQAS I AQI I VTWLGLVI GAI YFGLKNDST 421 

Qy 415 AI Q D RVGL L YQ FVGAT P YT GMLN AVNL F P VLRAVS DQ E S Q DGL YQKWQMMLAYAL - HVL P 473 

I I : I I : I : I : : : I I I I I : : I I I : I I : I I 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : : I I : : I : I I I I : I . I : : : : : I I : : I : 

Db 481 MTMLPSIIFTCIWFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSWSVA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : I I : : I I I I I I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFI EKTCPG 615 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 2 
US-09-767-594-1 

; Sequence 1, Application US/09767594 



Patent No. 6521635 
GENERAL INFORMATION: 
APPLICANT: Bates, Susan 
APPLICANT: Robey, Robert 

APPLICANT: The Government of the United States of America 
APPLICANT: as represented by the Secretary of the 
APPLICANT: Department of Health and Human Services 

TITLE OF INVENTION: Inhibition of MXR Transport by Acridine Derivatives 
FILE REFERENCE: 01528 0-4 02 100US 
CURRENT APPLICATION NUMBER: US/09/767,594 
CURRENT FILING DATE: 2001-01-22 
PRIOR APPLICATION NUMBER: US 60/177,410 
PRIOR FILING DATE: 2000-01-20 
NUMBER OF SEQ ID NOS : 2 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 1 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: human mitoxanthrone resistance (MXR) /BRCP/ABCP 
OTHER INFORMATION: protein 
US-09-767-594-1 

Query Match 20.3%; Score 674.5; DB 4; Length 655; 

Best Local Similarity 29.0%; Pred. No. 2.2e-63; 

Matches 181; Conservative 137; Mismatches 251; Indels 55; Gaps 18; 

Qy 21 SQSSLEGAPATAP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I : : I :::::: I I : || : : : I I : : : : : 

Db 13 SQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYWGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I : : I I I : : I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLS SLTVRETLHYTALLAI RRGNPG- S FQKKVEAVMAELSLSHVADRLI GNYSLGGI STG 196 

: : : I I I I I I : : I I : : : : hill III : I : I : I I 

Db 130 WGTLTVRENLQFST^RLATTMTNHEKNERINRVIEELGLDIWADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAKRNRIVVLTIHQPRSEL 256 

11:111 : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILSLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMI ESAYKKS AI CHKT LKNIERMKHLKTLPMVPF 356 

I : II II: : : : I I : : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRWRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I II::: : : : I I : : : I I : I 
Db 370 TT S FCHQLRWVSKRS FKNLLGNPQAS IAQI I VTWLGLVI GAI YFGLKNDST 421 



Qy 



415 AI Q D RVGLL YQ FVGAT P YT GMLNAVN L F P VL RAVS DQ E S Q DG L YQKWQMMLAYAL - HVL P 473 



I I : I I : I : I : : : I I I I I : : I I I : I I : I I 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : I I : : I : I I I I : I I : : : : : I I : : I : 

Db 4 81 MRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAI AAGQS WSVA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : I I : : I I I I I I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 3 

US-09-614-912-140 

Sequence 140, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Allen, Steve 
Rafalski, Antoni 
Orozco, Buddy 
Miao, Gou-Hau 
Famodu, Omolayo O. 
Lee, Jian Ming 
Sakai, Hajime 
Weng, Zude 
Caimi, Perry G 
Anderson, Shawn 
TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/09/614,912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS : 2 04 
SOFTWARE : Microsoft Office 97 
SEQ ID NO 140 
LENGTH: 12 96 
TYPE: PRT 

ORGANISM: Oryza sativa 
US-09-614-912-140 



Query Match 13.1%; Score 436.5; DB 4; Length 1296; 

Best Local Similarity 27.6%; Pred. No. 3.7e-37; 



Matches 173; Conservative 99; Mismatches 245; Indels 109; Gaps 27; 



Qy 84 ILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLT 143 

: II I I I I I I I I :: I : I : III II I I : I : I : I : : I 

Db 9 LLGPPSSGKTTLLLALAGKLDPSLRRGGEVTYNGFELEEFVAQKTAAYISQTDVHVGEMT 68 

Qy 144 VRETLHYTAL LAI RRGNPG SFQK — KVEAVMAELSLSHV 180 

I : I I I : : I III I II : I I : I : 

Db 69 VKETLDFSARCQGVGTKYDLLTELARREKEAGIRPEPEVDLFMKATSMEGVESSLQTDYT 128 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

I I : : I : I I I I : : : I I : : : I I : I I : I I I I I II 

Db 129 LRILGLDICADTIVGDQMQRGISGGQKKRVTTGEMIVGPTKVLFMDEISTGLDSSTTFQI 188 

Qy 232 WLLVELARRNRIWL-TIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYP 290 

I I : : : I : : I I I I : I I I I : I I I : : : : I : I : I I I I : 

Db 189 VKCLQQIVHLGEATILMSLLQPAPETFELFDDIILLSEGQIVYQGPREYVLEFFESCGFR 248 

Qy 291 CPEHSNPFDFYMDLTS VDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNI 342 

Ml I I : : I I I I I I : I : I : : : 
Db 249 CPERKGTADFLQEWSKKDQEQYWADKHRPYRYISVSEFAQ RFKRFHV GL 298 

Qy 343 ERMKHLKTLPMVPF-KTKDSPG — VFSKLGVLLRRVTRN LVRNKLAVI TRLL 391 

: II Mill: Mil l : : : I I I : : 

Db 299 QLENHLS VPFDKTRSHQAALVFSKQSVSTTELLKASFAKEWLLIKRNSFVYIFKTI 354 

Qy 392 QNLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQ 451 

I : I : I II:: : I I : I : I I : : : : I I I I : 

Db 355 QLIIVALVASTVFLRTQMHTRN — LDD — GFVY — IGALLFSLIVNMFNGFAELSLTITR 408 

Qy 452 ESQDGL-YQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAA 505 

: :| I I I I : MM:: :::: I |:|:| II II I 
Db 409 LPVFFKHRDLLFYPAWIFTLPNVILRIPFSIIESIVWVIVTYYTIGFAPEADRF — FKQL 466 

Qy 506 LLAPHLIGEF LTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKII 562 

II II: I | : : : | : | | : : : I I I I Mil 

Db 467 LLV-FLIQQMAGGLFRATAGLCRSMI IAQTGGALALLI FFVLG-GFL LPKAF-IP 518 

Qy 563 SYFTFQKYCSEI LWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGA 616 

::::[: I I I I I I : III MM II 
Db 519 KWWIWGYWVSPLMYGYNALAVNEFYSPRW MNKFVLDNNGVPKRLGIALME GA 570 

Qy 617 TSRFTMNFLI LYS FI PALVI LGIWF 642 

I : Ml Ml M 

Db 571 NIFTDKNWF WIGAAGLLGFTMF 592 



RESULT 4 

US-09-614-912-138 

Sequence 138, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT: Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT: Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 



APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB1378 US NA 
CURRENT APPLICATION NUMBER: US/ 09/ 614 , 912 
CURRENT FILING DATE: 2000-07-12 



PRIOR 


APPLICATION 


NUMBER: 




60/143, 401 


PRIOR 


FILING DATE 


: 1999- 


07 


-12 


PRIOR 


APPLICATION 


NUMBER: 




60/143,412 


PRIOR 


FILING DATE 


: 1999- 


07 


-12 


PRIOR 


APPLICATION 


NUMBER: 




60/146,650 


PRIOR 


FILING DATE 


: 1999- 


07 


-30 


PRIOR 


APPLICATION 


NUMBER: 




60/170, 906 


PRIOR 


FILING DATE 


' 1999- 


12 


-15 


PRIOR 


APPLICATION 


NUMBER: 




60/172, 959 


PRIOR 


FILING DATE 


1999- 


12 


-21 


PRIOR 


APPLICATION 


NUMBER: 




60/172, 946 


PRIOR 


FILING DATE 


1999- 


12 


-21 



NUMBER OF SEQ ID NOS : 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 138 
LENGTH: 617 
TYPE: PRT 

ORGANISM: Zea mays 
US-09-614-912-138 



Query Match 11.2%; Score 373.5; DB 4; Length 617; 

Best Local Similarity 25.2%; Pred. No. 6.5e-31; 

Matches 140; Conservative 101; Mismatches 205; Indels 109; Gaps 23; 

Qy 67 QILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFL-GEVYVNGRALRREQF 125 

I : I : : I : I : : : I I I : I I I I I : I : : I I : I : : I : : : I : I 

Db 37 Q LL RE VT G S FR P GVLT ALMGVS GAGKT T LMDVLAGR — KTGGYIEGDIRIAGYPKNQATF 94 

Qy 126 QDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS F QKKVEAVMAEL 175 

I I : I : I I I I : I I : I I : II : I : I I : 

Db 95 ARI SGYCEQNDIHS PQVTVRESLI YSAFLRL PGKIGDQEITDDIKMQFVDEVMELV 150 

Qy 176 SLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLL 235 

I : : I I : I : I : I I : I : I : : I I : I : : I : : I I I I : I I I I : : : 
Db 151 ELDNLRDALVGLPGITGLSTEQRKRLTIAVELVANPSIIFMDEPTSGLDARAAAIVMRTV 210 

Qy 236 VELARRNRIWLTIHQPRSELFQLFDKIAILS-FGELIFCG TPAEMLDFFNDC-GY 289 

I I I I I I I I : : I : I I : : : I I : : I : I : I : : : I | 

Db 211 RNTVDTGRTWCTIHQPSIDIFESFDELLLLKRGGQVIYSGKLGRNSQKMVEYFEAIPGV 270 

Qy 2 90 P-CPEHSNPFDFYMDLTSVDTQSK EREI ETSKRVQMI ESAYKKSAI CHKTLKNIE 343 

I : I I : : : : : II I : : : I I I I I : : : I I 

Db 271 PKIKDKYNPATWMLEVSSVATEVRLKMDFAKYYETS- DLYKQNKVLVNQLSQPE 323 

Qy 344 RMKHLKTLPMVP FKTKDSPGVFSKLGVLL RRVT RN LVRN KLAVI T R 389 

I I I : I : I I I I I I : : 

Db 324 PGTSDL YFPTEYSQSTI GQFKACLWKQWLTYWRS PDYNLVRYS FTLLVA 372 



Qy 390 LLQNLIMGLFLLFFVLRVRSNVLKGAIQDR VGL L YQ FVGAT P YT GMLN AVN L F P 443 

II I I : : I :: I : I : I I : I : I : I 

Db 373 LLLGSIF WRIGTN ME DATT L GMVT GAM YT AVMFIGINNCSTVQP 416 

Qy 444 VL RAVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF- 499 

I : II : I I : I : | : : | : | | : : : | : : I 

Db 417 VVSIERTVFYRERAAGMYSAMPYAIAQVVIEIPYVFVQTTYYTLIW7\MMSFQWTAVKFF 476 

Qy 500 GYFSAALLAPHLI GEFLTLVLLGIVQ NPN-IVNSWALLSIAGVLVGSGFLR 550 

III II I : : : I I I I : I : : I I I 

Db 477 WFFFISYFS FLYFT YYGMMAVS I S PNHEVAS I FAAAFFS LFNLFS GFF- 524 

Qy 551 NIQEMPIPFKIISYF 565 

I II II: 

Db 525 -IPRPRIPGWWIWYY 538 



RESULT 5 

US-09-614-912-144 

Sequence 144, Application US/09614912 
Patent No. 6677502 
GENERAL INFORMATION: 
APPLICANT: Allen, Steve 
APPLICANT : Rafalski, Antoni 
APPLICANT: Orozco, Buddy 
APPLICANT : Miao, Gou-Hau 
APPLICANT: Famodu, Omolayo O. 
APPLICANT: Lee, Jian Ming 
APPLICANT: Sakai, Hajime 
APPLICANT: Weng, Zude 
APPLICANT: Caimi, Perry G 
APPLICANT: Anderson, Shawn 

TITLE OF INVENTION: Plant Metabolism Genes 
FILE REFERENCE: BB137 8 US NA 
CURRENT APPLICATION NUMBER: US/ 09/ 614 , 912 
CURRENT FILING DATE: 2000-07-12 
PRIOR APPLICATION NUMBER: 60/143,401 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/143,412 
PRIOR FILING DATE: 1999-07-12 
PRIOR APPLICATION NUMBER: 60/146,650 
PRIOR FILING DATE: 1999-07-30 
PRIOR APPLICATION NUMBER: 60/170,906 
PRIOR FILING DATE: 1999-12-15 
PRIOR APPLICATION NUMBER: 60/172,959 
PRIOR FILING DATE: 1999-12-21 
PRIOR APPLICATION NUMBER: 60/172,946 
PRIOR FILING DATE: 1999-12-21 
NUMBER OF SEQ ID NOS: 204 
SOFTWARE: Microsoft Office 97 
SEQ ID NO 144 
LENGTH: 539 
TYPE: PRT 

Triticum aestivum 



ORGANISM: 
FEATURE: 
NAME/ KEY: 
LOCATION: 



UNSURE 
(272) . . 



(273) 



US-09-614-912-144 



Query Match 10.1%; Score 334.5; DB 4; Length 539; 

Best Local Similarity 23.8%; Pred. No. 8.4e-27; 

Matches 120; Conservative 108; Mismatches 216; Indels 61; Gaps 15; 

Qy 107 GTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQK 166 

I I I : I : I : : I I I I : I : I : I : I : : I I : : I 

Db 2 GYIEGEITVSGYPKKQETFARISGYCEQNDIHSPHVTIYESLVFSAWLRLPAEVDSERRK 61 

Qy 167 K-VEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDC 225 

: I : I : I : : I : I : I : I I : I : I : : I I : I : : I : : I I I I : I I I 
Db 62 MFIEEIMDLVELTSLRGALVGLPGVNGLSTEQRKRLTIAVELVANPSIIFMDEPTSGLDA 121 

Qy 226 MT7\NQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILS-FGELIFCG TPAEM 280 

I : : : I I I I I I I I : : | : | | : : : : I I I : I I : 

Db 122 RAAAI VMRTWNTWTGRTWCTIHQPSIDIFEAFDELFLMKRGGEEIYVGPVGQNST^NL 181 

Qy 2 81 LDFFNDC GYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMI ESA 328 

: : : I : II I I : : I : : I : : I : 

Db 182 IEYFEEIEGISKIKDGY N PAT WML E VS S S AQE EMLG I D FAEV 223 

Qy 329 YKKSAICHKTLKNIERMKHLKTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNK 383 

|::| : :| I :| I ::| : I |: I :: I : : II 

Db 224 YRQSELYQ RNKELIKEL-SMPAPGSSDLNFPTQYSRSFVTQCLACLWKQXXSYWRNP 279 

Qy 384 LAVITRLLQNLIMGLFL — L F FVL RVRS N VL KGAI Q D RVGL L YQ FVGAT P YT GMLNAVN L 441 

I I I : : : I : I : I : : I I : I I I : I : : : 

Db 280 SYTAVRLLFTIVIALMFGTMFWDLGSKTR RSQDLFNAMGSMYAAVLYIGVQNSGSV 335 

Qy 442 FPVL RAVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVAR 498 

II: I I : I I : I : I : : I : I : : I : I I I : 

Db 336 QPWWERTVFYRERAAGMYSAFPYAFGQVAIEFPYVLVQALIYGGLVYSMIGFEWTVAK 395 

Qy 499 FGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVG SGFLRNIQEM 555 

I :: : I | :: :|: | |: |::| | | I |:| :: 

Db 396 FLWYL FFMYFTML Y FT FYGMMAVGLT PN ESIAAIISSAFYNVWNLFSGYLIPRPKL 451 

Qy 556 PIPFKIISYFTFQKYCSEILWNEF 580 

II:: I : : I I : : I 

Db 452 PIWWRWYSWICPVAWTLYGLVASQF 476 



RESULT 6 

US-09-543-681A-5411 

Sequence 5411, Application US/09543681A 
Patent No. 6605709 
GENERAL INFORMATION: 
APPLICANT: GARY BRETON 

TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO PROTEUS 
MIRABILIS FOR 

TITLE OF INVENTION: DIAGNOSTICS AND THERAPEUTICS 
FILE REFERENCE: 2709.1002-001 
CURRENT APPLICATION NUMBER: US/09/543, 681A 
CURRENT FILING DATE: 2000-04-05 
PRIOR APPLICATION NUMBER: US 60/128,706 
PRIOR FILING DATE: 1999-04-09 



; NUMBER OF SEQ ID NOS : 8344 
; SEQ ID NO 5411 
; LENGTH: 653 
; TYPE: PRT 

; ORGANISM: Proteus mirabilis 
US-09-543-681A-5411 

Query Match 7.9%; Score 263.5; DB 4; Length 653; 

Best Local Similarity 22.0%; Pred, No. 5.5e-19; 

Matches 141; Conservative 111; Mismatches 203; Indels 187; Gaps 30; 

Qy 68 ILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGR 118 

: I : i I : : I : : : I : I : I I I I I : I I : : : I I : : II I I : 
Db 29 VLDQISLTINAGEMVAIIGASGSGKSTLMNIL-GCLDKPSS — GEYKVAGQCVADMESDQ 85 

Qy 119 — ALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELS 176 

Mill I :: I |:: II : : I : I : I ::: I I h 

Db 86 LAALRREH FG F I FQ RYHLMAH LTAEQNVE I P AI YAG K STEQRKERARALLT 136 

Qy 177 LSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLV 236 

: I : I : : I : I I : : : I I I I I I : : I : I I I I I II : : : : : I 

Db 137 RLGLAERI — HYRPSQLSGGQQQRVSIARALMNGGEVILADEPTGALDSQSGKEVMAILK 194 

Qy 237 ELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSN 296 

:| h= I I : | |:| : |::| II II 
Db 195 QLNQQGHTVI IVTHDPL — IAQQADRIIEIKDGQII SDNNN HHSA 237 

Qy 297 PFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPF 356 

I I : : | : : : | : : 
Db 238 P VKKVP PAI QTAS YFHQVI 256 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVXRVRSNVLKGAI 416 

II:: I I : I I I : : : I : I : I : : : : I : I 

Db 257 GRFTQ ALNMAWRAMWNKI RTLLTML-GI 1 1 GIASWTI I VIGDAA 301 

Qy 417 QD RVGLL YQ FVGAT P YT GMLN AVN L FP VL RAVS DQ ESQDGLYQKWQMMLAY 467 

: I I I : : I I I : : : : I II z I I I : : 
Db 302 KDRVLADIKAIGA NTIDI YPGKELGSDSPEDKQSLTIQDVDALKQQSYIQ 351 

Qy 468 ALHVLPFSWATMIFSS VCYWTLGLHPEVARFGYFSAALL APHL 511 

I I : I I I I : : : I II I I : 

Db 352 S VT PQI YFS S RLRRGNQDAPATVS GVNED YFSVYALKFAQGSTFTPDM 399 

Qy 512 IGEFLTLVLLGIVQN PN IVNSWALLSIAGVLVG SGFLRNIQ-EM 555 

I : | : | : | I I : : : : : I : : I III : 

Db 400 IHRQAQVW — IDENTRHRFFPNKQAVIGEQIIIRNIPSTIIGWAEQKSTFGDNKSLRV 457 

Qy 556 PIPFKIISYFTFQK-YCSEILV-VNEFYG LNFTCGS SNVSVTTNPMCAF 602 

: I : : I : : I I I I I I I I : : I 
Db 458 WVPYSTLS SRI YNRS YLDNITVKVKEGYDASVAEQQI LRLLTI RHGKKDI F 508 

Qy 603 TQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKI 644 

I I I I II : : : I I : I I I I 

Db 509 TYNI DS FI KAAEKTTQ — TMQLFLTLVAVI SLWGGI GVMNI 548 



RESULT 7 



US-09-4 8 9-039A-9127 

; Sequence 9127, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/ 09/4 8 9 , 039A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 

PRIOR FILING DATE: 1999-01-29 
; NUMBER OF SEQ ID NOS : 14342 
; SEQ ID NO 9127 
LENGTH: 384 
TYPE: PRT 

ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-9127 

Query Match 7.8%; Score 259; DB 4; Length 384; 

Best Local Similarity 23.9%; Pred. No. 6.7e-19; 

Matches 84; Conservative 70; Mismatches 133; Indels 64; Gaps 11; 

Qy 56 DITSCRQQWTR-QILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVY 114 

: I : : : : I I : I I : I I : I I I : : : I I I I I I I I I I I : : I : I : 

Db 24 EIANIKKSFGRTQVLNDISLDIPSGQMVALLGPSGSGKTTLLRIIAGLEHQTS GHIR 80 

Qy 115 VNGRALRREQFQD-CFSYVLQSDTLLSSLTVRETLHY — TALLAI RRGNPGS FQKKVEAV 171 

: I : I : I : I I I : I I : : : I I I I : : I I : 

Db 81 FHGT DVS RMHARDRKVGFVFQH YALFRHMT VFDN I AFGLT VL P RRERPNAAAI KAKVT KL 14 0 

Qy 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

: : I : I : I i I : I I : : : I I : : I I : I : : : I I I I II : : 

Db 141 LEMVQLAHLADRYPAQ LSGGQKQRVALARALAVEPQILLLDEPFGALDAQVRKEL 195 

Qy 232 VVLLVELARRNRI VVLTI HQPRS ELFQLFDKI AI LS FGELI FCGT PAE MLD 282 

I : I : : : : I : : I : : : : I I : I :|: 

Db 196 RRWLRQLHEELKFTSVFVTHDQEEAMEVADRVWMSQGNIEQADAPERVWREPSTRFVLE 255 

Qy 283 FFND CGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKR 321 

I : I I I : I I :: I I |: ::: 

Db 256 FMGEVNRLQGVI RGGQFHVGAHRWPLGY-TPAYQGPVDLFLRPWEVDI-SRRTSLDSPLP 313 

Qy 322 VQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLL 372 

I I : : I : : I I I : I II I hi 

Db 314 VQVLEASPK GHYTQLWQPLGWYDEP LSWL 344 



RESULT 8 

US-09-543-681A-8215 

; Sequence 8215, Application US/09543681A 

; Patent No. 6605709 

; GENERAL INFORMATION: 

; APPLICANT: GARY BRETON 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO PROTEUS 
MIRABILIS FOR 



; TITLE OF INVENTION: DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.1002-001 

; CURRENT APPLICATION NUMBER: US/ 09/543 , 68 1A 

; CURRENT FILING DATE: 2000-04-05 

; PRIOR APPLICATION NUMBER: US 60/128,706 

; PRIOR FILING DATE: 1999-04-09 

; NUMBER OF SEQ ID NOS : 8344 

; SEQ ID NO 8215 

LENGTH: 210 

TYPE: PRT 
; ORGANISM: Proteus mirabilis 
US-09-543-681A-8215 

Query Match 7.8%; Score 258.5; DB 4; Length 210; 

Best Local Similarity 33.7%; Pred. No. 2.6e-19; 

Matches 70; Conservative 45; Mismatches 8 0; Indels 13; Gaps 4; 

Qy 65 TRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQ 124 

I I I : I I I : : I I : I I I I I I I I I I I I : I : : I I I I : 

Db 11 TTGILTEVSLHLEQGCCLGISGSSGSGKTTLLNAIAGYTDYTGDI VLANQNMNKLPV 67 

Qy 125 FQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHVADRL 184 



Db 68 WQRPCRYLNQRLYLFPFLTVKQNLWLAQYAAKQKRS KEKEIALLEQMGIAHLATRY 123 

Qy 185 IGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVljLVELARRNRI 244 

I I I I : : I I : : I I : I I : : I I I I : II I I : I : : I : : 
Db 124 PSQ ISGGEQQRVALARALISQPKLLLMDEPFSSLDWETRYQLWELIISLKKQQIT 178 

Qy 245 WLTIHQPRSELFQLFDKIAILSFGELI 272 

: : : I : II I I III : I I I : : : 
Db 179 MI IVTHEPR-ELQALADKTLLLSNGKIV 205 



RESULT 9 

US-09-543-681A-7638 

; Sequence 7638, Application US/09543681A 

; Patent No. 6605709 

; GENERAL INFORMATION: 

; APPLICANT: GARY BRETON 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO PROTEUS 
MIRABILIS FOR 

; TITLE OF INVENTION: DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.1002-001 

; CURRENT APPLICATION NUMBER: US/09/543, 681A 

; CURRENT FILING DATE: 2000-04-05 

; PRIOR APPLICATION NUMBER: US 60/128,706 

; PRIOR FILING DATE: 1999-04-09 

; NUMBER OF SEQ ID NOS: 8344 

; SEQ ID NO 7638 

LENGTH: 373 

TYPE: PRT 

ORGANISM: Proteus mirabilis 
US-09-543-681A-7638 



Query Match 7.6%; Score 2 53.5; DB 4;- Length 373; 

Best Local Similarity 25.1%; Pred. No. 2.5e-18; 



Matches 89; Conservative 61; Mismatches 133; Indels 71; Gaps 12; 



Qy 44 SYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMS 100 

I : : I I : : I I : : I I I : I I I I : : I : II I I I I I I I I I : : 

Db 13 SIEINH-VTKYFDRT EVLHDVNLTVNSGEMMALLGPSGSGKTTLLRIIAGLE 63 

Qy 101 GRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHY — TALL 154 

I : : I I : : : I : : I I I : I I I : : I I 

Db 64 HQTEGKICFAGQDVSRLHARERKV G FVFQH YAL FRHMT VFEN I AFGLTVL P 114 

Qy 155 AI RRGNPGSFQKKVEAVMAELSLSHVADRLI GNYSLGGI STGERRRVS IAAQLLQDPKVM 214 

II : III : : : I I : I I : I I : : : I I : : I I : I : : : 

Db 115 RRERPNKAAIDKKVTQLLEMIQLPHLAQRYPAQ LSGGQKQRVALARALAVEPQIL 169 

Qy 215 LFDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFC 274 

I I II II : : III : : : : I : : I : I I : I : : 

Db 170 LLDEPFGALDAKVRTELRSWLRELHSELKFTSVFVTHDQQEAMEVADRIVIMGNGKIEQV 229 

Qy 275 GTPAE MLDFFND CGYPCPEHSNPFDFYMDLTSVDTQ 310 

III: : I : I I III II II 

Db 230 GTPQQWHTPESRFVLEFLGDWHLQGEINGAQLQIAGYHLPLSVTP — LYQG — KVDVF 285 

Qy 311 SKEREI ET SKRVQMI ESAYKKS AI CHKTLKNI E — RMKHLKTLPMVPFKTKDSP 362 

: I I : : : I : I I I I I : I : : : I 

Db 286 LRPWEISLNPH S DS LCKLPVKVT EVT PKGH YWQLVLQP I EWGNT P 330 



RESULT 10 

US-09-540-236-3618 

; Sequence 3618, Application US/09540236 

; Patent No. 6673910 

; GENERAL INFORMATION: 

; APPLICANT: Gary L. Breton et al . 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
MO RAX EL LA CATARRHAL I S 

; TITLE OF INVENTION: FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 27 09.2 005-001 

; CURRENT APPLICATION NUMBER: US/09/540, 236 

; CURRENT FILING DATE: 2000-04-04 

; NUMBER OF SEQ ID NOS: 384 0 

; SEQ ID NO 3618 

; LENGTH: 2 45 

TYPE: PRT 
; ORGANISM: M. catarrhalis 
US-09-540-236-3618 

Query Match 7.5%; Score 249.5; DB 4; Length 245; 

Best Local Similarity 32.3%; Pred. No. 3.2e-18; 

Matches 74; Conservative 49; Mismatches 87; Indels 19; Gaps 9; 

Qy 62 QQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSG — RLGRAGTFLGEVYVNGRA 119 

I : I : : : I I I : I I I : : I I I : I : I I I I : I : : I I : : : I 

Db 19 QRW WEDVS FEI EQGQWGI LGPNGAGKTT S FYMVI GLVPMDKGQVI LGDQDI S KNA 75 

Qy 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQK-KVEAVMAELSLS 178 

: I : I : I : : III:: hi I : : I : : : I : I I : I 

Db 76 M-HERAAKGIGYLPQEASIFRKLTVEQNI— MAILQTRKDLTQTEQRQQLEKLMADFHLE 132 



Qy 179 HVADRLIGNYSLG-GISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVE 237 

II : I I I : I I I I I I II I : I I : I I I I I : I : : : I : : : 

Db 133 HV RHSLGMSVSGGERRRCEIARCLASNPKFILLDEPFAGVDPISVSDIMQVIET 186 

Qy 238 LARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFND 286 

I I |::| I II : I hi h: I |:| I I : 
Db 187 LRERGIGVLITDHNVR-ETLSICQKAYIVSEGKVIAQGNKDEIL — FNE 232 



RESULT 11 

US-09-489-039A-13987 

; Sequence 13987, Application US/09489039A 

; Patent No. 6610836 

; GENERAL INFORMATION: 

; APPLICANT: Gary Breton et. al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
KLEBSIELLA 

; TITLE OF INVENTION: PNEUMONIAE FOR DIAGNOSTICS AND THERAPEUTICS 

; FILE REFERENCE: 2709.2004001 

; CURRENT APPLICATION NUMBER: US/09/489, 039A 

; CURRENT FILING DATE: 2000-01-27 

; PRIOR APPLICATION NUMBER: US 60/117,747 

; PRIOR FILING DATE: 1999-01-29 

; NUMBER OF SEQ ID NOS: 14342 

; SEQ ID NO 13987 

LENGTH: 34 4 

TYPE: PRT 
; ORGANISM: Klebsiella pneumoniae 
US-09-489-039A-13987 



Query Match 7.4%; Score 245; DB 4; Length 344; 

Best Local Similarity 31.7%; Pred. No. 1.8e-17; 

Matches 84; Conservative 44; Mismatches 89; Indels 48; Gaps 12; 

Qy 42 HASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSG 101 

Mil: I I : : : I : I I I I : I I I : I I : I I I : 

Db 13 HVSKSFSRKGHP VLALQHINLSIERGDIFGIIGYSGAGKSTLL-RLIN 59 

Qy 102 RLGRAGTFLGEVYVNGRALRREQFQDC FSYVLQSDTLLSSLTVRETLHY 150 

II I I I I : I I I I I I : I : I I : I I I I 
Db 60 RLETPGE — GEVLLNG EPLQDCSGQRLQAIKKDIGMIFQNFNLLNSKTV FHN 109 



Qy 151 TALLAI RRGNPGS F-QKKVEAVMAELSLSHVADRLI GNYSLGGI STGERRRVS IAAQLLQ 209 

I : I : I : I I : I : : I : I I I : I : I : I I : :: I I I I I 
Db 110 IAIPLILQGRDKAFIQARVAELLAFVDLS DK-IHSYP-NELSGGQKQRVGIARALAT 164 

Qy 210 DPKVMLFDEPTTGLDCMTANQIVVLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFG 269 

: I I : I I I I : I I I I I : : I I I : I I I : : I I : : : I : I : : I 

Db 165 NPSVLLCDEATSALDPHTTVQILLLLQEINRRYGITIVLITHEMSVIQKICHKVAVMQAG 224 



Qy 27 0 ELIFCGTPAEMLDFFNDCGYPCPEH 2 94 

:: I : I I |:| 
Db 225 RIVEQGA VFDLFAQ PQH 241 



RESULT 12 



US-09-134-001C-3731 

; Sequence 3731, Application US/09134001C 
; Patent No. 6380370 
; GENERAL INFORMATION: 

; APPLICANT: Lynn Douce tte-Stamm et al 

; TITLE OF INVENTION: NUCLEIC ACID AND AMINO ACID SEQUENCES RELATING TO 
STAPHYLOCOCCUS 

; TITLE OF INVENTION: EPIDERMIDIS FOR DIAGNOSTICS AND THERAPEUTICS 
; FILE REFERENCE: GTC-007 

; CURRENT APPLICATION NUMBER: US/09/ 134 , 001C 

; CURRENT FILING DATE: 1998-08-13 

; PRIOR APPLICATION NUMBER: US 60/064,964 

; PRIOR FILING DATE: 1997-11-08 

; PRIOR APPLICATION NUMBER: US 60/055,779 

; PRIOR FILING DATE: 1997-08-14 

; NUMBER OF SEQ ID NOS : 5674 

; SEQ ID NO 3731 

LENGTH: 24 8 

TYPE: PRT 

; ORGANISM: Staphylococcus epidermidis 
US-09-134-001C-3731 

Query Match 7.3%; Score 244; DB 4; Length 248; 

Best Local Similarity 29.3%; Pred. No. 1.3e-17; 

Matches 70; Conservative 53; Mismatches 92; Indels 24; Gaps 9; 

Qy 67 QILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSG-RLGRAGTFLGEVYVNGRALRRE— 123 

: : : I : I : I : : : : : I I I I I I I I I I : : : II I I I I I : 

Db 20 EVIKGIDLKINQGEWTLIGRSGSGKTTLLRMINALEIPTEGT VYVNGMTYNTKDK 75 

Qy 124 QFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

: : I I: I : I : |: ::: I : :: ::|:: I I 

Db 7 6 KSQIKVRQQSGMVFQNYNLFPHKSALENV-MEGLITVKKMNKAT7\NEEAMNLL7VKVGLA^ 134 

Qy 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKWILFDEPTTGLDCMTANQIWLLVELA 239 

II: : : I I I : : : I I : II I : I I I I I I I I I I : I I I : : : : I II 

Db 135 VKDQ — RPHALSG GQQQRVAIARALAMNPKVMLFDEPTSALDPELVNDVLKVIKELA 189 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPF 298 

: I : I : I : : : : I I : I : III Ml : I I I 
Db 190 DEGMTMVI VTHEMRFAK- EVSNQI AFI HEGVI AEQGT PE DIFN HPKTEELQRF 241 



RESULT 13 
US-08-752-447-2 

; Sequence 2, Application US/08752447 

; Patent No. 5994088 

; GENERAL INFORMATION: 

; APPLICANT: Mechetner, Eugene 

; APPLICANT: Roninson, Igor B 

; TITLE OF INVENTION: Methods and Reagents for Preparing and 

; TITLE OF INVENTION: Using Immunoligcal Agents Specific for P-glycoprotein 
; NUMBER OF SEQUENCES: 2 
; CORRESPONDENCE ADDRESS: 

; ADDRESSEE: McDonnell Boehnen Hulbert & Berghoff Ltd. 

; STREET: 300 South Wacker Drive, Seventh Floor 

CITY: Chicago 



; STATE: Illinois 

COUNTRY: USA 

ZIP : 60606 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 

COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1,30 

; CURRENT APPLICATION DATA: 
; APPLICATION NUMBER: US/08/752 , 447 

; FILING DATE: 15-NOV-1996 

CLASSIFICATION: 435 
ATTORNEY/AGENT INFORMATION: 

NAME: No. 5994088nan, Kevin E 
REGISTRATION NUMBER: 35,303 
REFERENCE/ DOCKET NUMBER: 95,1121 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 312-913-0001 
TELEFAX: 312-913-9808 
; INFORMATION FOR SEQ ID NO: 2: 
SEQUENCE CHARACTERISTICS: 
LENGTH: 1280 amino acids 
TYPE: amino acid 
TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-08-752-447-2 

Query Match 7.3%; Score 244; DB 2; Length 1280; 

Best Local Similarity 20.7%; Pred. No. 2.3e-16; 

Matches 153; Conservative 106; Mismatches 229; Indels 250; Gaps 32; 

Qy 41 LHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMS 100 

: I I I I : I I I : : I I : I I I : : : I : I I I I : I : I 

Db 397 VHFSYPSRKEVK 1 LKGLNLKVQS GQTVALVGNSGCGKSTT VQLMQ 441 

Qy 101 GRLGRAGTFLGEVYVNGRALR — REQ FQDC FS YVLQ S DT LL S S LT VRET LH YTALLAI RR 158 



Db 



442 - RL — Y D P T E GMVS VDGQ D I RT I N VRFLRE 1 1 G WS Q E P VL FAT T I AEN I R Y- 



490 



QY 



Db 



159 GNPGSFQKKVEAVMAE LSLSHVADRLI GNYSLGGI STGERRRVS I AAQLLQDP 211 

I : : I : I : I I I hi : I I : : : I : : I I I : : : I 

491 GRENVTMDEIEKAVKEANAYDFIMKLPHKFDTLVGERG-AQLSGGQKQRIAIAR7VLVRNP 54 9 



Qy 



212 KVMLFDEPTTGLDCMTANQIVVXLVELARRNRIVVLTIHQPRSELFQLFDKIAILSFGEL 271 



Db 



550 KILLLDEATSALD-TESEAWQVALDKARKGRTTIVIAH — RFAT VRNADVI AG FDDGVI 606 



QY 



272 IFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSK 320 



Db 



607 VEKGNHDELM KE KG I Y FK L VTMQ T AGN EVE L EN AAD E S K S E I DAL E 652 



Qy 



321 RVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPF KTKDSPG 363 



Db 



653 MSSNDSRSSLIRKRSTRRSVRGSQAQHRKLSTKEALD — ESIPPVSFWRIMKLNLTEWPY 710 



Qy 



364 



VFSK-LGVLLR RVTRN LVRNKLAVITR 389 

: I I I : I I I I I I : : I I 



Db 



711 FWGVFCAIINGGLQPAFAIIFSKIIGVFTRIDDPETKRQNSNLFSLLFLALGIISFITF 770 



Qy 390 LLQNLIMG LFLLFFVLRVRSNV LKGAIQ 417 

II I I : I : I : I : I II I 

Db 771 FLQGFTFGKAGEILTKRLRYMVFRSMLRQDVSWFHDPKNTTGALTTRLANDAAQVKGAIG 830 

Qy 418 DRV GL L YQ FVGAT P YT GMLN AVNL F P VL RAVSDQE 452 

I : I : : I : I : I I : I : : : I : I : : 

Db 831 SRLAVITQNIANLGTGIIISFIYGWQLTLLLLAI — VPI IAI AGWEMKMFAGQALKDKK 888 

Qy 453 SQDGL YQKWQMMLAYALHV LPFSWATM 480 

: I I I : : I I : I I : I I I 

Db 889 ELEGAGKIATEAI ENFRTWSLTQEQKFEHMYAQSLQVPYRNSLRKAHIFGITFSFTQAM 94 8 

Qy 4 81 IFSSV— CYWTLGLHPEVARFGYFSAALLAPHLIGEF — LTLVLLGIVQNPNIVNSV 533 

: : I I : III I I I : I : I I : I II 

Db 949 MYFSYAGCF RFG AYLVAHKLMS FEDVLLVFSAVVFGAMAVGQVS S F 994 

Qy 534 VALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCG 58 8 

I :| I ::: |:: |: I :| |: I I I I I 

Db 995 APDYAKAKISAAHIIM IIEKTPL IDSYSTEGLMPNTLEG-NVTFG 1038 

Qy 589 S S N VS VT TN PMCAFT Q G I 606 

: I I II: 
Db 1039 EWFNYPTRPDIPVLQGL 1056 



RESULT 14 
US-09-316-167-2 

; Sequence 2, Application US/09316167 
; Patent No. 6365357 
; GENERAL INFORMATION: 
; APPLICANT: Mechetner, Eugene 
APPLICANT: Roninson, Igor B 

TITLE OF INVENTION: Methods and Reagents for Preparing and 
; TITLE OF INVENTION: Using Immunoligcal Agents Specific for P-glycoprotein 
; NUMBER OF SEQUENCES: 2 

; CORRESPONDENCE ADDRESS: 

; ADDRESSEE : McDonnell Boehnen Hulbert & Berghoff Ltd. 

; STREET: 300 South Wacker Drive, Seventh Floor 

; CITY: Chicago 

; STATE: Illinois 

; COUNTRY: USA 

; ZIP: 60606 

; COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
; COMPUTER: IBM PC compatible 

; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.30 

CURRENT APPLICATION DATA: 

APPLICATION NUMBER: US/09/316,167 
; FILING DATE: 

; CLASSIFICATION: 

PRIOR APPLICATION DATA: 
; APPLICATION NUMBER: 08/752,447 

FILING DATE: 15-NOV-1996 
; ATTORNEY/ AGENT INFORMATION: 



NAME: No. 6365357nan, Kevin E 

REGISTRATION NUMBER: 35,303 

REFERENCE/DOCKET NUMBER: 95,1121 
; TELECOMMUNICATION INFORMATION: 

TELEPHONE: 312-913-0001 

TELEFAX: 312-913-9808 
; INFORMATION FOR SEQ ID NO: 2: 
SEQUENCE CHARACTERISTICS: 

LENGTH: 128 0 amino acids 
; TYPE: amino acid 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
US-09-316-167-2 



Query Match 7.3%; Score 244; DB 4; Length 1280; 

Best Local Similarity 20.7%; Pred. No. 2.3e-16; 

Matches 153; Conservative 106; Mismatches 229; Indels 250; Gaps 32; 

Qy 41 LHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMS 100 

: I I I I : I I I : : I I : I I I : : : I : I I I I : I : I 

Db 397 VHFSYPSRKEVK 1 LKGLNLKVQS GQT VALVGNS GCGKSTTVQLMQ 441 

Qy 101 GRLGRAGTFLGEVYVNGRALR — REQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRR 158 

II I I I : I : : I : I I : : : I : I : I : I 
Db 442 -RL--YDPTEGMVSVDGQDIRTINVRFLREIIGWSQEPVLFATTIAENIRY 490 

Qy 159 GN PGS FQKKVEAVMAE LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDP 211 

I : : I : I : I I I I : I : I I : : : I : : I I I : : : I 

Db 4 91 GRENVTMDEIEKAVKEANAYDFIMKLPHKFDTLVGERG-AQLSGGQKQRIAIARALVRNP 549 



Qy 212 KVMLFDEPTTGLDCMTANQIWLLVELARRNRIVVLTIHQPRSELFQLFDKIAILSFGEL 271 

I::| II I: II : :| : :: ||: I :: I I : III I : 

Db 550 KILLLDEATSALD-TESEAWQVALDKARKGRTTIVIAH — RFATVRNADVIAGFDDGVI 606 

Qy 272 IFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSK 320 

: I |:: I I :: I I |:| : 

Db 607 VEKGNHDELM KEKGI YFKLVTMQTAGNEVELENAADES KS EI DALE 652 



Qy 321 RVQMI ESAYKKSAI CHKTLKNI ERMKHLKTLPMVPF KTKDSPG 363 

I : : : I hi I : :::| I I : I 

Db 653 MS SNDSRS SLI RKRSTRRSVRGSQAQHRKLSTKEALD — ES I PPVS FWRIMKLNLTEWP Y 710 



Qy 364 VFSK-LGVLLR RVTRN LVRNKLAVITR 389 

: I I I : I I I I I I : : I I 

Db 711 FWGVFCAIINGGLQPAFAIIFSKIIGVFTRIDDPETKRQNSNLFSLLFLALGIISFITF 770 



Qy 390 LLQNLIMG LFLLFFVLRVRSNV LKGAIQ 417 

II I I : I : I : I : I I I I 

Db 771 FLQGFTFGKAGEILTKRLRYMVFRSMLRQDVSWFHDPKNTTGALTTRLANDAAQVKGAIG 830 

Qy 418 DRV GLLYQFVGATPYTGMLNAVNLFPVL RAVSDQE 452 

I: I:: I: I : I I : I:: :|: |:: 

Db 831 SRLAVITQNIANLGTGI 1 1 SFI YGWQLTLLLLAI — VP 1 1 AI AGWEMKMFAGQALKDKK 888 

Qy 453 SQDGL YQKWQMMLAYALHV LPFSWATM 480 

: I I I : : I I : I I : | | I 

Db 889 ELEGAGKIATEAI ENFRTWSLTQEQKFEHMYAQSLQVPYRNSLRKAHIFGITFSFTQAM 948 



Qy 481 IFSSV — CYWTLGLHPEVARFGYFSAALLAPHLIGEF — LTLVLLGIVQNPNIVNSV 533 

: : I I : III I I I : I : I I : I II 

Db 949 MYFSYAGCF RFG AYLVAHKLMS FEDVLLVFSAWFGAMAVGQVS S F 994 

Qy 534 VALLS IAGVLVGSGFLRNIQEMPI PFKI I S YFTFQKYCSEI LWNEFYGLNFTCG 588 

I : I I :: : I :: I : I : I I : I I I I I 

Db 995 APDYAKAKISAAHIIM IIEKTPL IDSYSTEGLMPNTLEG-NVTFG 1038 

Qy 589 S S N VS VT TN PMC AFT QG I 606 

: I I II: 

Db 1039 EWFNYPTRPDI PVLQGL 1056 



RESULT 15 
US-09-397-233-2 

; Sequence 2, Application US/09397233 
; Patent No. 6630327 

GENERAL INFORMATION: 
; APPLICANT: Mechetner, Eugene 

; Roninson, Igor B 

TITLE OF INVENTION: Methods and Reagents for Preparing and 
; Using Immunological Agents Specific for P- 

glycoprotein 

NUMBER OF SEQUENCES: 2 
CORRESPONDENCE ADDRESS: 

ADDRESSEE: McDonnell Boehnen Hulbert & Berghoff 
STREET: 300 South Wacker Drive 
; CITY: Chicago 

STATE: Illinois 
COUNTRY: USA 
ZIP: 60606 
COMPUTER READABLE FORM: 

MEDIUM TYPE: Floppy disk 
COMPUTER: IBM PC compatible 
; OPERATING SYSTEM: PC-DOS/MS-DOS 

; SOFTWARE: Patentln Release #1.0, Version #1.30 

; CURRENT APPLICATION DATA: 

; APPLICATION NUMBER: US/ 09/397 , 233 

FILING DATE: 16-Sep-1999 
CLASSIFICATION: <Unknown> 
ATTORNEY/AGENT INFORMATION: 

NAME: No. 6630327nan, Kevin E 
REGISTRATION NUMBER: 35,303 
REFERENCE/DOCKET NUMBER: 95,1121-C 
TELECOMMUNICATION INFORMATION: 
TELEPHONE: 312-913-0001 
TELEFAX: 312-913-0002 
INFORMATION FOR SEQ ID NO: 2: 
; SEQUENCE CHARACTERISTICS: 

; LENGTH: 12 80 amino acids 

; TYPE: amino acid 

TOPOLOGY: linear 
MOLECULE TYPE: protein 
SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
US-09-397-233-2 



Query Match 7.3%; Score 244; DB 4; Length 1280; 

Best Local Similarity 20.7%; Pred. No. 2.3e-16; 

Matches 153; Conservative 106; Mismatches 229; Indels 250; 



Gaps 



32; 



Qy 41 LHAS YS VSHRVRPWWDI T S CRQQWTRQI LKDVS L YVES GQ IMC I LGS S GS GKTTLLDAMS 100 

: I I I I : | | | : : | | : M I : : : I : I I I I : I : I 

Db 397 VHFSYPSRKEVK ILKGLNLKVQSGQTVALVGNSGCGKSTTVQLMQ 441 

Qy 101 GRLGRAGTFLGEVYVNGRALR — REQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRR 158 

II | I I : I : : I : I I : : : I : I : I : I 
Db 442 -RL — YDPTEGMVSVDGQDIRTINVRFLREIIGWSQEPVLFATTIAENIRY 490 

Qy 159 GNPGSFQKKVEAVMAE LSLSHVADRLI GNYSLGGI STGERRRVS IAAQLLQDP 211 

I ::| : I : I I I |:| : I I : : : I : : I I I : : : I 

Db 491 GRENVTMDEIEKAVKEANAYDFIMKLPHKFDTLVGERG-AQLSGGQKQRIAIARALVRNP 549 

Qy 212 KVMLFDEPTTGLDCMTANQIWLLVELAJ^RNRIWLTIHQPRSELFQLFDKIAILSFGEL 271 

I : : I I I I : I I : : I : : : M : I : : I I : III I : 

Db 550 KI LLLDEAT SALD-TES EAVVQVALDKARKGRTT I VI AH — RFATVRNADVI AGFDDGVI 606 

Qy 272 IFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSK 320 

: I I:: I I :: I I hi : 

Db 607 VEKGNHDELM KEKGIYFKLVTMQTAGNEVELENAADESKSEIDALE 652 

Qy 321 RVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPF — KTKDSPG 363 

I : : : I hi I : :::| I I : I 

Db 653 MSSNDSRSSLIRKRSTRRSVRGSQAQHRKLSTKEALD — ESIPPVSFWRIMKLNLTEWPY 710 

Qy 364 VFSK-LGVLLR RVTRN LVRNKLAVITR 389 

: I I I : I I I I I I : : I I 

Db 711 FWGVFCAI INGGLQPAFAI I FSKI IGVFTRIDDPETKRQNSNLFSLLFLALGI I S FITF 770 

Qy 390 LLQNLIMG LFLLFFVLRVRSNV LKGAIQ 417 

II I | : | : | : | : | | | | 

Db 771 FLQGFTFGKAGEILTKRLRYMVFRSMLRQDVSWFHDPKNTTGALTTRLANDAAQVKGAIG 830 

Qy 418 DRV GL L YQ FVGAT P YT GMLNAVN L F PVL RAVSDQE 452 

h h : h I : I h I : : : I : h : 

Db 831 SRLAVITQNIANLGTGIIISFIYGWQLTLLLLAI — VPIIAIAGWEMKMFAGQALKDKK 888 

Qy 453 SQDGL YQKWQMMLAYALHV LPFSWATM 480 

: I I h: I I : I I : I I I 

Db 889 ELEGAGKIATEAI ENFRTWSLTQEQKFEHMYAQSLQVPYRNSLRKAHIFGITFSFTQAM 948 

Qy 481 IFSSV— CYWTLGLHPEVARFGYFSAALLAPHLIGEF — LTLVLLGIVQNPNIVNSV 533 

:: I h III I I I : I : I I : I II 

Db 94 9 MYFSYAGCF RFG AYLVAHKLMS FEDVLLVFSAWFGAMAVGQVS S F 994 

Qy 534 VALLS I AGVLVGS GFLRNI QEMP I P FKI I S YFT FQKYC S EI LWNEFYGLNFTCG 588 

I : I I :: : h : h I : I I : I I I I I 

Db 995 APDYAKAKISAAHIIM IIEKTPL IDSYSTEGLMPNTLEG-NVTFG 1038 

Qy 589 SSNVSVTTNPMCAFTQGI 606 

: I I II: 

Db 1039 EWFNYPTRPDI PVLQGL 1056 

Search completed: February 27, 2004, 07:20:16 

Job time : 16.7508 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 
Run on : 



Title: 

Perfect score: 
Sequence : 



February 27, 2004, 06:44:33 ; Search time 14.5049 Seconds 

(without alignments) 
4317.206 Million cell updates/sec 

US-09-989-981A-6 
3326 

1 MGDLS S LT PGGSMGLQVNRG PALVI LGI WFKI RDHLI S R 651 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 283366 seqs, 96191526 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 200000.0000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



283366 



Database 



PIR_78: * 
pirl : * 
pir2:* 
pir3 : * 
pir4 : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
JC7860 

brain multidrug resistance protein, BMDP - pig 
C; Species: Sus scrofa domestica (domestic pig) 

C;Date: 18-Nov-2002 #sequence_revision 18-Nov-2002 #text_change 31-Mar-2003 
C;Accession: JC7860 
R;Eisenblaetter, T.; Galla, H.J. 

Biochem. Biophys . Res. Commun. 293, 1273-1278, 2002 

A;Title: A new multidrug resistance protein at the blood-brain barrier. 

A/Reference number: JC7860; MUID: 22050127 ; PMID : 12054514 

A; Accession: JC7860 

A;Molecule type: mRNA 

A; Residues: 1-656 <EIS> 

A; Cross-references : GB:AJ42 0927 

A; Experimental source: brain 

C; Comment: This protein, a new transport protein of the ATP-binding cassette 
(ABC) superfamily of transporters, expressed in porcine brain capillary 
endothelial cells, plays an importnat role in the exclusion of xenobiotics from 
the brain and participates in drug transport across the blood-brain barrier and 
therefore is considered as a efflux pump at the cerebral endothelium. 



C; Genetics : 
A; Gene : bmdp 



Query Match 20.1%; Score 668.5; DB 2; Length 656; 

Best Local Similarity 28.5%; Pred. No. 3.4e-42; 

Matches 180; Conservative 144; Mismatches 252; Indels 55; Gaps 18; 

Qy 13 MGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDV 72 

::::::: I I : : : : | : : : : : | | : | | : : : I I : : 

Db 8 VS I PMS KRNTNGLPGS S SNELKT S AGGAVLS FHDI CYRVKVKSGFLFCRKTVEKEI LTNI 67 

Qy 73 SLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYV 132 

: : : I : I I I : I I I : : I I I : : I I I : I : I I I I I : II 

Db 68 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPHG-LSGDVLING-APRPANFKCNSGYV 124 

Qy 133 LQSDTLLS SLTVRETLHYTALLAI RRGNPGS F QKKVEAVMAEL S L S HVADRL I GN 187 

: I I :: : I I I I I I :: I I : I : : : : I : I I I III : I 

Db 125 VQDDWMGTLTVRENLQFSAALRL PTTMTNHEKNERINMVI QELGLDKVADS KVGT 180 

Qy 188 YSLGGISTGERRRVSIAAQLLQDPK^LFDEPTTGLDCMTANQIVVLLVELARRNRIWL 247 

: I : I I I I : I I I I : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : 
Db 181 QFIRGVSGGERKRTSIAMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIF 240 

Qy 248 TIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSV 307 

: I I I I I : I : I I I : : I : I I : I | I I : I III : : I I I I : : I : : 
Db 241 S IHQPRYS I FKLFDSLTLLASGRLMFHGPAREALGYFAS IGYNCEPYNNPADFFLDVING 300 

Qy 308 DTQ S KERE I ET S KRVQMI E SAYKKSAICHKTLKNIE RMK 346 

I : : : I I I : I : : I I : I : : : I 

Db 301 DSSAWLSRADRDEGAQEPEEPPEKDTPLIDKLAAFYTNSSFFKDTKVELDQFSGGRKKK 360 

Qy 347 HLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL— LFFV 404 

I : I : I : I I : I I : I I : : : : : I : I I : : I : 

Db 361 KS SVYKEVT YTT SFCHQLRWISRRSFKNLLGNPQASVAQIIVTIILGLVIGAI FYD 416 

Qy 405 LRVT^SNVXKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMM 464 

I : I I | | : | | : | : | : : : I I I I : : I I I : 

Db 417 LK NDPSG-IQNRAGVLF-FLTTNQCFSSVSAVELLWEKKLFIHEYISGYYRVSSYF 471 

Qy 465 IA.YAL-HVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGI 523 

I : I I : : : : I I : : I : I | | | | I I : : : : : I I 
Db 472 FGKLLSDLLPMRMLPS 1 1 FTCITYFLLGLKPAVGS FFIMMFTLM MVAYS AS SMALAI 528 

Qy 524 VQNPNIVNSWALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFY 581 

::h |::|: I :: || I |:: : : ||: :| I III 

Db 529 AAGQS WSVATLLMTI S FVFMMI FSGLLVNLKTWPWLSWLQYFS I PRYGFSALQYNEFL 588 

Qy 582 GLNFTCGSSNVSVTTNPMCAFT — QGIQFIE 610 

III I : : I I I I I : I I : : : I 

Db 58 9 GQNFGPG LNVTTNNTCSFAICTGAEYLE 616 



RESULT 2 
T47652 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T26I12.10 

C; Species: Arabidopsis thaliana (mouse-ear cress) 



C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C /Accession: T47 652 

R;Monfort, A.; Casacuberta, E. ; Puigdomenech, P.; Mewes, H.W.; Lemcke, K.; 

Mayer, K.F.X.; Quetier, F.; Salanoubat, M. 

submitted to the Protein Sequence Database, February 2000 

A;Reference number: Z24471 

A; Accession: T47652 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-725 <MON> 

A/Cross-references : EMBL: AL132 954 

A; Experimental source: cultivar Columbia; BAC clone T2 6I12 
C; Genetics : 
A;Map position: 3 
A;Note: T26I12.10 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 19.5%; Score 649.5; DB 2; Length 725; 

Best Local Similarity 29.4%; Pred. No. l.le-40; 

Matches 182; Conservative 124; Mismatches 246; Indels 67; Gaps 15; 

Qy 33 PEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGK 92 

I I : I : I I : I I II : : I I I I I I : : I I : I I : I I 

Db 68 PVPYVLNFNNLQYDVTLRRR FGFSRQNGVKTLLDDVSGEASDGDILAVLGASGAGK 123 

Qy 93 TTLLDAMS GRLGRAGT FLGEVYVNG- RALRREQFQDCFS YVLQS DTLLS SLTVRETLHYT 151 

: I I : I I : : I I : I : I I : I I : I : : : I I : I I I I I I : I I I : 
Db 124 STLI DALAGRVAE - GS LRGS VTLNGEKVXQS RLLKVI S AYVMQDDLLFPMLTVKETLMFA 182 

Qy 152 ALLAI RRG-NPGS FQKKVEAVMAELSLSHVADRLI GNYSLGGI STGERRRVS I AAQLLQD 210 

: : I : : : I I I : : : I I : I : : I I : I : I I I I I I I I I : : I 

Db 183 S EFRLPRS LS KS KKMERVEALI DQLGLRNAANTVT GDEGHRGVS GGERRRVS I GI DI I HD 242 

Qy 211 PKVMLFDEPTTGLDCMTANQIVVTjLVTSLARRNRI 270 

II: I I I I : I I I I : I : I : I : I I : : : I I II : : : I I : : I I I I : 
Db 243 PIVLFLDEPTSGLDSTNAFMWQVLKRIAQSGSIVIMSIHQPSARIVELLDRLIILSRGK 302 

Qy 271 LIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETS 319 

: I I : I I : I I : I I I I I I : I : I I I I : I I 

Db 303 SVFNGSPASLPGFFSDFGRPIPEKENISEFALDLV RELEGSNEGTKALVDFN 354 

Qy 320 KRVQMIESAYK KSAICHKTL — KN I E RMKH LKTL PMVP FKT KD 360 

: : : I : I I : I I I I : I : 

Db 355 EKWQQNKISLIQSAPQTNKLDQDRSLSLKEAINASVSRGKLVSGSSRSNPTSMETVSSYA 414 

Qy 361 SPGVFSKLGV^LRRWRNLVRNKIAVITRLLQNLIMGLFLLFFVXRVRSNVLKGAIQDRV 420 

: I : I : I : I : I : I III: : : I II I : : I I I : I : 

Db 415 NPSLFETF-ILAKRYMKNWIRMPELVGTRIATVMWGC-LLATVYWKLDHTPRGA-QERL 471 

Qy 421 GLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

I : I I I : I : I : I I : : I : I : : : : : : I II : : : 

Db 472 -TLFAFWPTMFYCCLDNVPVFIQERYIFLRETTHNAYRTSSYVISHSLVSLPQLLAPSL 530 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNS-WALLSI 539 

: I I : : : I I : I I : I : : : I I : : I : I III: : I : : : 

Db 531 VFSAI T FWTVGLS GGLEGFVFYCLLI YAS FWSGS SWTFI S GW — PNIMLCYMVSITYL 588 



Qy 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

11:1111 : I : I : II I : : : I I I : I 

Db 589 AYCLLLSGFYVNRDRI PFYWTWFHYI S I LKYP YEAVLINEF DDPS 633 

Qy 600 CAFTQGIQFIEKTCPGATS 618 

I : I : I : I I I 
Db 634 RC FVRGVQVFD S T LLGGVS 652 



RESULT 3 
S77690 

probable membrane protein YOL075c - yeast ( Saccharomyces cerevisiae) 
N;Alternate names: hypothetical protein OH25; hypothetical protein O1130; 
hypothetical protein YOL074c 
C; Species: Saccharomyces cerevisiae 

C;Date: 21-Apr-1997 #sequence_revision 09-May-1997 #text_change 19-Apr-2002 

C;Accession: S77690; S66767; S66768 

R;Alexandraki, D.; Katsoulou, C; Tzermia, M. 

submitted to the Protein Sequence Database, July 1996 

A; Reference number: S66756 

A; Accession: S77690 

A; Molecule type: DNA 

A; Residues: 1-1294 <ALE> 

A; Cross-references: EMBL:Z74816; MIPS:YOL075c 

A; Note: this is a revision to the sequence from reference S66756 
A;Accession: S66767 
A;Molecule type: DNA 

A; Residues: 1-17 9, 1 TTRTGVFLWKRED 1 <ALW> 

A; Cross-references : EMBL: Z74816 

A; Experimental source: strain S288C 

A;Note: this sequence has been revised in reference S77690 

A; Note: this was assumed to be protein YOL074c 

A;Accession: S66768 

A; Molecule type: DNA 

A; Residues: 200-1294 <ALF> 

A;Cross-references : EMBL: Z74817 

A; Experimental source: strain S288C 

A; Note: this sequence has been revised in reference S77690 

A; Note: this was assumed to be the complete sequence of protein YOL075c 

C; Genetics: 

A; Cross-references : SGD: SO 0054 35 
A;Map position: 15L 
A; Note: YOL075c 

C; Super family : unassigned ATP-binding cassette proteins; ATP-binding cassette 
homology 

C; Keywords: ATP; nucleotide binding; P-loop; transmembrane protein 

F; 4 5-2 63 /Domain: ATP-binding cassette homology <ABC1> 

F; 62-69/Region: nucleotide-binding motif A (P-loop) 

F; 37 6-3 92 /Domain : transmembrane #status predicted <TMl> 

F; 469-485/Domain: transmembrane #status predicted <TM2> 

F; 4 9 6- 5 12 /Domain : transmembrane #status predicted <TM3> 

F; 606-622/Domain: transmembrane #status predicted <TM4> 

F; 7 10- 9 16/ Domain : ATP-binding cassette homology <ABC2> 

F; 727-734/Region: nucleotide-binding motif A (P-loop) 

F; 1042- 1058 /Domain : transmembrane #status predicted <TM5> 

F; 1125-1141/Domain : transmembrane #status predicted <TM6> 



F; 1177-1193/Domain: transmembrane ftstatus predicted <TM7> 
F; 12 69-12 8 5 /Domain : transmembrane #status predicted <TM8> 

Query Match 18.9%; Score 627; DB 2; Length 1294; 

Best Local Similarity 31.7%; Pred. No. 1.2e-38; 

Matches 181; Conservative 106; Mismatches 228; Indels 56; Gaps 19; 

Qy 65 TRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFL GEVYVNGRA 119 

I : : I I : I : : I I I : I I I I I I : : I i : : I II I : : I I : I 

Db 706 TKEILQSVNAIFKPGMINAIMGPSGSGKSSLLNLISGRL-KSSVFAKFDTSGSIMFNDIQ 764 

Qy 120 LRREQFQDCFS YVLQSDT-LLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLS 178 



Db 765 VSELMFKNVCSYVSQDDDHLLAALTVKETLKY7\AALRLHHLTEAERMERTDNLIRSLGLK 82 4 

Qy 179 HV7VDRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVEL 238 

I : : I I I : I I I II : I I I : : II I II : : I I I I I : I II I : I : : I : I 

Db 825 HCENNIIGNEFVKGISGGEKRRVTMGVQLLNDPPILLLDEPTSGLDSFTSATILEILEKL 884 

Qy 239 AR-RNRIWLTIHQPRSELFQLFDKIAILS-FGELIFCGTPAEMLDFFNDCGYPCPEHSN 296 

I : : : : : I I I I I II I I I : I : : I : I I I : I I I : : I : I I I I : I 

Db 885 CREQGKTIIITIHQPRSELFKRFGNVLLLAKSGRTAFNGSPDEMIAYFTELGYNCPSFTN 944 

Qy 297 PFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPF 356 

I I : : I I I I : I I : : : I I : I I : I I I : I I : : : I I : 

Db 945 VADFFLDLISVNTQNEQNEISSRARVEKILSAWK ANMDN-ESLSPTPISEK 994 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLV RNKLAVITRLLQNLIMGLFLL 401 

: I : : : I : III I : : : : I : I : I : 

Db 995 QQYSQESFFTEYSEFVRK-PANLVLAYIVNVKRQFTTTRRSFDSLMARIAQIPGLGVIFA 1053 

Qy 402 FFVLRWSN^KGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKW 461 

I I : I : I : I : I I : I : II I : : I I : I I : I 

Db 1054 LFFAPVKHNYT — SISNRLGLAQEST-ALYFVGMLGNLACYPTERDYFYEEYNDNVYGIA 1110 

Qy 462 QMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTL 518 



Db 1111 PFFLAYMTLELPLSALASVLYAVFTVLACGL-PRTA — GNFFATVYCSFIVTCCGERLGI 1167 

Qy 519 VLLGIVQNPN-IVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILW 577 

: : | : | | : : | | | : | | I | : | III:: 

Db 1168 MTNTFFERPGFWNCI SI ILSIGTQMSGLMSL GMSRVLKGFNYLNPVGYTSMIIIN 1223 

Qy 57 8 NEFYG-LNFTC — GSSNVSVTTNPMCAFTQG 605 

I I I I I I I I III 
Db 1224 FAFPGNLKLTCEDGGKNSDGT CEFANG 1250 



RESULT 4 
S19421 

ATP-dependent permease ADP1 precursor - yeast (Saccharomyces cerevisiae) 
M;Alternate names: protein YCROllc; protein YCR105 
C; Species: Saccharomyces cerevisiae 

C;Date: 31-Mar-1992 #sequence_revision 31-Mar-1992 #text_change 19-Jan-2001 
C;Accession: S19421; S40914 
R;Goffeau, A.; Purnelle, B. ; Skala, J. 

submitted to the Protein Sequence Database, March 1992 



A; Reference nuinber: S19420 

A; Accession: S19421 

A; Molecule type: DNA 

A; Residues: 1-1049 <GOF> 

A; Cross-references: EMBL:X59720; NID: gl907116; PIDN : CAA42328 . 1; PID : gl907154 ; 
GSPDB:GN00003; MIPS:YCR011c 
R;Purnelle, B.; Skala, J.; Goffeau, A. 
Yeast 7, 867-872, 1991 

A; Title: The product of the YCR105 gene located on the chromosome III from 
Saccharomyces cerevisiae presents homologies to ATP-dependent permeases. 
A; Reference number: S40914; MUID : 92160395; PMID: 1789009 
A; Accession: S40914 

A; Status: not compared with conceptual translation 

A; Molecule type: DNA 

A; Residues: 1-1049 <PUR> 

R; Skala, J.; Purnelle, B.; Goffeau, A. 

Yeast 8, 409-417, 1992 

A; Title: The complete sequence of a 10.8 kb segment distal of SUF2 on the right 

arm of chromosome III from Saccharomyces cerevisiae reveals seven open reading 

frames including the RVS161, ADP1 and PGK genes. 

A; Reference number: S25353; MUID : 92327849 ; PMID: 1626432 

A; Contents: annotation 

C; Genetics : 

A;Gene: SGD:ADP1; MIPS:YCR011c 

A; Cross-references : SGD : S0000604 ; MIPS:YCR011c 
A;Map position: 3R 

C; Super family : ATP-dependent permease ADP1; ATP-binding cassette homology 

C; Keywords: ATP; glycoprotein; nucleotide binding; P-loop; transmembrane protein 

F; 1-25/Domain : signal sequence #status predicted <SIG> 

F;26-1049/Product : ATP-dependent permease ADP1 #status predicted <MAT> 

F;26-324'/Domain: extracellular #status predicted <EXT> 

F; 325-34 1/Domain: transmembrane #status predicted <TM1> 

F; 4 06- 60 7 /Domain : ATP-binding cassette homology <ABC> 

F; 42 3-4 30/ Region : nucleotide-binding motif A (P-loop) 

F; 550-557/Region: nucleotide-binding motif B 

F;794-810/Domain: transmembrane istatus predicted <TM2> 

F; 829-845/Domain: transmembrane #status predicted <TM3> 

F;878-894/ Domain : transmembrane #status predicted <TM4> 

F; 909-925/Domain : transmembrane #status predicted <TM5> 

F; 938-954/Domain: transmembrane ftstatus predicted <TM6> 

F; 1025-1041/Domain: transmembrane #status predicted <TM7> 

F; 50, 114 , 165, 221/Binding site: carbohydrate (Asn) (covalent) #status predicted 
F; 429/Binding site: ATP (Lys) #status predicted 

Query Match 18.7%; Score 621; DB 1; Length 1049; 

Best Local Similarity 28.6%; Pred. No. 2.5e-38; 

Matches 196; Conservative 111; Mismatches 223; Indels 156; Gaps 22; 

Qy 68 ILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQD 127 

: I : : I I : I I I : I : I I I : I I I I I I I : : : : I I : I I I : : I : I 
Db 405 VLNEI SGI VKPGQI LAIMGGS GAGKTTLLD I LAMK-RKTGHVS GS I KVNGI SMDRKS FSK 463 

Qy 128 CFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKK -VEAVMAELSLSHVADRL 184 

: I I I I I : I I I I I : : I I I : : I I : I I I : I I : : I I : 

Db 464 1 1 GFVDQDDFLLPTLTVFETVLNSALLRLPKAL — SFEAKKARVYKVLEELRI I DI KDRI 521 



Qy 185 IGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELAR-RNR 243 



Db 522 IGNEFDRGISGGEKRRVSIACELVTSPLVLFLDEPTSGLDASNANNVIECLVRLSSDYNR 581 

Qy 244 IWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMD 303 

: I I : I I I I I I : I I I I I: : I I I I : : : I : : : I ' II II— I I : : I 

Db 582 TLVLSIHQPRSNIFYLFDKLVLLSKGEMVYSGNAKKVSEFLRNEGYICPDNYNIADYLID 641 

Qy 304 LT-SVDTQSKEREI 316 

Db 642 ITFEAGPQGKRRRIRNISDLEAGTDTNDIDNTIHQTTFTSSDGTTQREWAHLAAHRDEIR 701 

Qy 317 ETSKRVQMI ESAYKKSAI CHKTLKNI ERM 345 

I ::: III : : I I : 

Db 702 SLLRDEEDVEGTDGRRGATEIDLNTKLLHDKYKDSVYYAELSQEIEEVLSEGDEESNVLN 761 

Qy 346 KHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVL 405 

II : I I :| :| | :|: | | :: | ::: | I I 

Db 762 GDLPT GQQSAGFLQQLS I LNSRS FKNMYRNPKLLLGNYLLTI LLSLFLGTLYY 814 

Qy 406 RVRSNVLKGAIQDRVGLLY QFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQ 4 62 

I I I : I I : I': I I : : I : I I : : I : I : : I : I 
Db 815 NV-SNDISG-FQNRMGLFFFILTYFGFVTFTGL S S FALERI I FI KERSNN YYS P — 866 

Qy 463 MMLAYAL HVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLT 517 

III: I : I I I : : I : I I I : : I : : I : I I 

Db 867 — LAYYI SKIMS EWPLRWPPILLSLIVYPMTGLNMKDNAF-FKCIGILILFNLGISLE 923 

Qy 518 LVLLGIV QNPNIVNSWALLSIAGVLVGSGFLRNIQEMP-IPFKIISYFTFQKYCSE 573 

: : : I I : I : I : I I : I I I I : I I I : : : I I : I : I I 
Db 924 I LTI GI I FEDLNNS 1 1 LSVLVLL GSLLFSGLFINTKNITNVAFKYLKNFSVFYYAYE 98 0 

Qy 574 ILWNEF YGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 624 

I : : I I I I I I I I I I I I 

Db 981 SLLINEVKTLMLKERKYGLNI EVPGATILSTFGF 1014 

Qy 625 LI LYSFI PALVT LGI — WFKI RDHL 648 

: : : : : I I : I I I I : I 
Db 1015 - WQNLVFDI KI LALFNWFLIMGYL 1039 



RESULT 5 
T47648 

ABC transporter-like protein - Arabidopsis thaliana 

N;Alternate names: protein T15C9.80 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence__revision 20-Apr-2000 #text_change 19~May-2000 
C /Accession: T47 64 8 

R;Mewes, H.W.; Rudd, S.; Lemcke, K. ; Mayer, K.F.X. 
submitted to the Protein Sequence Database, April 2000 
A; Reference number: Z24470 
A;Accession: T47648 
A; Status: preliminary 
A; Molecule type: DNA 
A; Residues: 1-720 <MEW> 
A;Cross-references: EMBL : AL132970 

A; Experimental source: cultivar Columbia; BAC clone T15C9 
C; Genetics : 



A; Map position: 3 
A;Note: T15C9.80 

C; Super family : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 18.5%; Score 614; DB 2; Length 720; 

Best Local Similarity 28.7%; Pred. No. 4.9e-38; 

Matches 182; Conservative 122; Mismatches 254; Indels 76; Gaps 17; 

Qy 23 S SLEG — APATAPEPHSLGI LHAS YSVS HRVRPWWDITSCRQQWTRQILKDVS 73 

I I I : I II I : : I : I I I : I I : : I : : I : : I 
Db 40 SSLDGDNDHLMRPVPFVLSFNNLTYNVSVRRKLDFHDLVPWRRTSFSK TKTLLDNIS 96 

Qy 7 4 LYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVL 133 

I : I : : I I : I I I I I : I I : I I :: I : : I : I I : I I I I : : : I I : 

Db 97 GETRDGEILAVXGASGSGKSTLIDAIANRIAK-GSLKGTVTLNGEALQSRMLW 155 

Qy 134 QSDTLLSSLTVRETLHYTALLAIRRGNPGSFQK-KVEAVM7VELSLSHVADRLIGNYSLGG 192 

III I I I I I I : I : I | I : I : I : I : : : I : : I : I I : I 
Db 156 QDDLLFPMLTVEETLMFAAEFRLPRSLPKSKKKLRVQALIDQLGIRNAAKTIIGDEGHRG 215 

Qy 193 ISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIV 252 

I I I I I I I I I I : : I I I : I I I I : I I I : I : I : I : I I : : : : I I I I 

Db 216 ISGGERRRVSIGIDIIHDPIVLFLDEPTSGLDSTSAFMWKVLKRIAESGSIIIMSIHQP 275 

Qy 253 RSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTS 306 

: II:: III : I I : I I : II I Ml: I : I : I I 
Db 276 SHRVLSLLDRLIFLSRGHTVFSGSPASLPSFFAGFGNPIPENENQTEFALDLIRELEGSA 335 

Qy 307 VDTQSKEREIETSKRVQMIESAYKKSAICHKTLK NIERMKHLK 349 

I : I : I :: I : : : I I I : I I I : 

Db 336 GGTRGLVEFNKKWQEMKKQSNPQTLTPPASPNP — NLTLKEAI SASISRGKLVSGGGGGS 393 

Qy 350 TLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLL 401 

11:1 : :: | || | I : || :: I |:| 

Db 394 S VI NHGGGT LAVPAFAN P FWI EI KTLTRRS I LNSRRQPELLGMRLAT VTVTG- FIL 448 

Qy 402 FFVLRWSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKW 4 61 

I | | | : | : | : | : | : I : :|: :| I : :|: |:: 

Db 449 ATVFWRLDNSPKG-VQERLG-FFAFAMSTMFYTCADALPVFLQERYIFMRETAYNAYRRS 506 

Qy 462 QMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVXL 521 

: I : : I : I : : : I : : I : I I : I : : : I I I I 

Db 507 SWLSHAIVTFPSLIFLSIAFAWTFWAVGLEGGLMGFLFYCLIIIASFWSGSSFVTFLS 566 

Qy 522 GIVQNPNIV NSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWN 578 

1:1 I : : : I I I : I I hill I : I : I : II I : : I 

Db 567 GW--PHVMLGYTIWAIL — AYFLLFSGFFINRDRIPQYWIWFHYLSLVKYPYEAVLQN 622 

Qy 579 EFYGLNFTCGSS N VS VT TN PMC AFT QGIQFIEKT 612 

II : : I | : I : | : : 

Db 623 EF SDPTECFVRGVQLFDNS 641 

RESULT 6 
C84423 

probable ABC transporter [imported] - Arabidopsis thaliana 



C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Feb-2001 #sequence_revision 02-Feb-2001 #text_change 02-Feb-2001 
C;Accession: C84423 

R;Lin, X.; Kaul, S.; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V.; Buell 
C.R.; Ketchum, K.A.; Lee, J. J.; Ronning, CM. ; Koo, H. ; Moffat, K.S.; Cronin, 
L.A. ; Shen, M. ; VanAken, S.E.; Umayam, L . ; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A. J.; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D.; Nierman, W.C; White, 0. ; Eisen, J. A.; Salzberg, S.L.; Fraser 
CM.; Venter, J.C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana . 

A; Reference number: A84420; MUID: 20083487; PMID: 10617197 
A; Accession: C84423 
A; Status: preliminary 
A;Molecule type: DNA 
A; Residues: 1-725 <STO> 

A; Cross-references: GB:AE002093; NID : g4262239 ; PIDN : AAD14532 . 1 ; GSPDB: GN00139 

C; Genetics : 

A; Gene: At2g01320 

A; Map position: 2 



Query Match 18.4%; Score 610.5; DB 2; 

Best Local Similarity 29.7%; Pred. No. 9e-38; 
Matches 166; Conservative 111; Mismatches 244; 



Length 725; 
Indels 37; 



Gaps 11; 



Qy 

Db 



51 VRP WWDITSC RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR 102 

: I I I : I I : I : I I : I I : I : : : I : I I I I I I I I I I : : : I : 

65 IRPVTIRWRNITCSLSDKSSKSVRFLLKNVSGEAKPGRLLAIMGPSGSGKTTLLNVLAGQ 124 



Qy 

Db 

Qy 

Db 



103 LGRAGT — FLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRR-G 159 

I : I : I I I : : : : : : I I I I I I M I I I : I I : 

125 LSLSPRLHLSGLLEVNGKPSSSKAYK — LAFVRQEDLFFSQLTVRETLSFAAELQLPEIS 182 

160 NPGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEP 219 

: : I : : : I I II : I : : I I I I I : : I : I : I : I : I h III 

183 SAEERDEYWNLLLKLGLVSCADSCVGDAKVRGISGGEKKRLSLACELIASPSVIFADEP 242 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



220 TTGLDCMTANQI VVLLVELARRNRIVVLTIHQPRSELFQLFDKIAILSFGELIFCGTPA- 278 

I I I I I I : : : I : I I : I : : I I I I I : : I I I : I : I I : : I I I 

243 TTGLDAFQAEKVMETLQKLAQDGHTVICSIHQPRGSVYAKFDDIVLLTEGTLVYAG-PAG 301 

279 -EMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAY — KKSAIC 335 

I I : I : I : I I I I I I : I I I I I I II : I I I : I : : I : : 

302 KEPLT YFGNFGFLCPEHVNPAEFLADLI SVDYS S SETVYS SQKRVHALVDAFSQRS SSVL 361 

336 HKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRV TRNLVRNKLAVI 387 

: I : : : I : : I : : : I I : I I I I : : : I 

362 YATPLSMKEETKNGMRPRRKAIVERTDGWWRQFFLLLKRAWMQASRDGPTNKVRARMSVA 421 

388 T RL LQN L IMGL FLL FFVLRVRSNVL KGAI Q D RVG L L YQ FVGAT P YT GMLN AVN L F P VL RA 447 

: :: | : I : : I I I I : I I I I : I : I I I I 

422 SA VI FGSVFWRMGKSQTSIQDRMGLLQVAAINTAMAALTKTVGVFPKERA 471 

4 48 VSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALL 507 
: I : I II : | : : : I : : I : I I I : I : : I I I I : 



Db 



472 IVDRERSKGSYSLGPYLLSKTIAEIPIGAAFPLMFGAVLYPMARLNPTLSRFGKFCGIVT 531 



Qy 508 APHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTF 567 

: I : : I : : I I : : I I : I I I I : I : 

Db 532 VESFAASAMGLTVGAWPSTEAAMAVGPSLMTVFIVFG-GYYVNADNTPIIFRWIPRASL 590 

Qy 568 QKYCSEILWNEFYGLNF 585 

: : : I : I I I I I I 
Db 591 IRWAFQGLCINEFSGLKF 608 



RESULT 7 
T47650 

ABC transporter-like protein - Arabidopsis thaliana 

N; Alternate names: protein T15C9.110 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 20-Apr-2000 #sequence_revision 20-Apr-2000 #text_change 19-May-2000 
C; Accession: T47650 

R;Mewes, H.W.; Rudd, S . ; Lemcke, K. ; Mayer, K.F.X. 

submitted to the Protein Sequence Database, April 2000 

A; Reference number: Z24470 

A; Accession: T47650 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-708 <MEW> 

A; Cross-references : EMBL: AL132970 

A; Experimental source: cultivar Columbia; BAC clone T15C9 
C; Genetics : 
A;Map position: 3 
A;Note: T15C9.110 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 18.3%; Score 608; DB 2; Length 708; 

Best Local Similarity 27.5%; Pred. No. 1.3e-37; 

Matches 171; Conservative 136; Mismatches 256; Indels 58; Gaps 15; 

Qy 18 NRGSQSSLEGAPA — TAPEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLY 75 

I : I 1:111 I I I : I I : I II : I : : : : I I : : 

Db 41 NAPTQHILDLAPAAETRSVPFLLSFNNLSYNWLRRR — FDFSRRKTASVKTLLDDITGE 98 

Qy 7 6 VESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNG-RALRREQFQDCFSYVLQ 134 

I : I : : I I I I : I I : I I : I I : : I I : : I I : I I : I : : : I I : I 

Db 99 ARDGEI LAVLGGS GAGKSTLI DAIAGRVAE- DS LKGTVTLNGEKVLQS RLLKVT S AYVMQ 157 

Qy 135 SDTLLSSLTVRETLHYTALLAI RRGNPGSFQ-KKVEAVMAELSLSHVADRLIGNYSLGGI 193 

II I I I : I I I : : : I I I : : : I I : : : I I : I I : I I : I : 

Db 158 DDLLFPMLTVKETLMFASEFRLPRSLPKSKKMERVETLIDQLGLRNAADTVIGDEGHRGV 217 

Qy 194 STGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPR 253 

I I I I I I I I I : : I I : : I I I I : I I I I : I : I : I : : I : : : I I I I 
Db 218 SGGERRRVSIGIDIIHDPILLFLDEPTSGLDSTNAFMWQVLKRIAQSGSWIMSIHQPS 277 

Qy 254 SELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDL TSV 307 

: : I I : : I I I I : : I I : I : I I : I I I I I : I : I : : I 

Db 278 ARIIGLLDRLIILSHGKSVFNGSPVSLPSFFSSFGRPIPEKENITEFALDVIRELEGSSE 337 



Qy 308 DT QSKEREI ETSKRVQMI E SAYKKSAICHKTLKNIERMKHLKTLP 352 

I I : : I I I : I I : : : I I : : : 

Db 338 GTRDLVEFNEKWQQNQTARATTQSRVSLKEAIAASVSRGKLVSGSSGANPISMETVSSYA 397 

Qy 353 MVPFKTKDSPGVFSKLGVLLRRWRNLVRNKIAVITRLLQNLIMGLFLLFBVLRVRSNVL 412 

I : : : | : | : | : | : | : : : I I I I : I 

Db 398 NPP ■ LAET FI LAKRYI KNWI RTPELI GMRI GTVMVT GLLLATVYWRL-DNT P 447 

Qy 413 KGAIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVL 472 

:|| |:|:| : I :| : : : :| I : :|: I: ::::M I 

Db 448 RGA-QERMG-FFAFGMSTMFYCCADNIPVFIQERYIFLRETTHNAYRTSSYVISHALVSL 505 

Qy 473 PFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNS 532 

I : : : I : : : I I : I I : I I : : I I : : I : : I I :: I 

Db 506 PQLLALSIAFAATTFWTVGLSGGLESFFYYCLIIYAAFWSGSSIVTFISGLI — PNVMMS 563 

Qy 533 -VVALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILAA/NEFYGLNFTCGSSN 591 

: I : : : I : II I : I : : I : II I : : : I I I 
Db 564 YMVTIAYLSYCLLLGGFYINRDRIPLYWIWFHYISLLKYPYEAVLINEF 612 

Qy 592 VSVTTNPMCAFTQGIQFIEKT 612 

: I I : I : I : I 

Db 613 DDPSRCFVKGVQVFDGT 629 



RESULT 8 
FYFFW 

white protein - fruit fly (Drosophila melanogaster ) 
C; Species: Drosophila melanogaster 

C;Date: 31-Dec-1990 #sequence_revision 17-Feb-1995 #text_change 19-Jan-2001 
C;Accession: S08635; S07263; S10240 
R;Pepling, M. ; Mount, S.M. 
Nucleic Acids Res. 18, 1633, 1990 

A; Title: Sequence of a cDNA from the Drosophila melanogaster white gene. 
A; Reference number: S08635; MUID : 90221897 ; PMID:2109311 
A; Accession: SO 8 635 
A; Molecule type: mRNA 
A; Residues: 1-687 <PEP> 

A; Cross-references: EMBL:X51749; NID:g8825; PIDN : CAA36038 . 1; PID:g8826 
R;0'Hare, K. ; Murphy, C. ; Levis, R. ; Rubin, G.M. 
J. Mol. Biol. 180, 437-455, 1984 

A; Title: DNA sequence of the white locus of Drosophila melanogaster. 
A; Reference number: S07263; MUID : 85134 8 65 ; PMID: 6084717 
A;Accession: S07263 
A;Molecule type: DNA 

A; Residues: 1-24, 1 LIFEIPYHCRVTAD 1 , 30- 

334, 1 ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVIGSPRYG 1 ,372-687 <OHAl> 

A; Cross-references: EMBL:X02974 

A; Experimental source: strain Canton S 

R;0'Hare, K. 

submitted to the EMBL Data Library, June 1985 
A;Reference number: S10240 
A; Accession: S10240 
A; Molecule type: DNA 

A;Residues: 1-24 , ' LIFEIPYHCRVTAD 1 , 30-687 <OHA2> 

A; Cross-references: EMBL:X02974; NID:gl0873; PIDN: CAA26716 . 1 ; PID:gl0874 
A; Experimental source: strain Canton S 



C; Genetics : 

A; Gene: white; w 

A; Cross-references : FlyBase : FBgn0003996 
A;Introns: 24/3; 116/1; 334/2; 439/3; 483/3 

C; Superfamily: fruit fly white protein; ATP-binding cassette homology 

C; Keywords: ATP; glycoprotein; nucleotide binding; P-loop; transmembrane protein 

F; 113-3 17 /Domain : ATP-binding cassette homology <ABC> 

F; 130-137/Region: nucleotide-binding motif A (P-loop) 

F; 261-2 65 /Region : nucleotide-binding motif B 

F; 67, 93, 472, 554, 651 /Binding site: carbohydrate (Asn) (covalent) #status 
predicted 

Query Match 18.1%; Score 602.5; DB 1; Length 687; 

Best Local Similarity 28.8%; Pred. No. 3.3e-37; 

Matches 180; Conservative 131; Mismatches 220; Indels 95; Gaps 19; 

Qy 66 RQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR— LGRAGTFLGEVYVNGRALRRE 123 

: : I I : I I : : : : : I I I I : I I I I I I : I : : I I : I : I I : : : 

Db 110 KHLLKNVCGVAYPGELLAVMGSSGAGKTTLLNALAFRSPQGIQVSPSGMRLLNGQPVDAK 169 

Qy 124 Q FQ D C F S YVLQ S DT L L S S LT VRET LH YT AL LAI RRGN P G S FQ K KVEAVMAELSLSHV 180 

: I : I I I I : I II I I I : I : : I : : : : : I : I : I I I I I 

Db 170 EMQARCAYVQQDDLFIGSLTAREHLI FQAM — VRMP RH LT YRQ RVARVDQVI Q EL S L S KC 227 

Qy 181 ADRLIG-NYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

: I I : | : | | I I : I : : I : : I I I : : : I I I I : I I I I I : : I : I : I : 

Db 228 QHTIIGVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLS 287 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

:: : I : I I I I I I I M I : I I I I I :: : I : I I I I : I : I I I : I II : II I 

Db 288 QKGKTVILTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPAD 347 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I : : : I I I I I : I : I : I | : : : | : : I I I 
Db 348 FYVQVLAV VPGREIESRDRIAKICDNFAIS KVARDMEQ L LAT KN L E KPL 396 

Qy 360 DSP GVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL-LFFVLRVRSN 410 

: I I : : I I : : : : | | I I : I : : : : I I : : : 

Db 397 EQPENGYT YKATWFMQFRAVLWRSWLSVLKEPLLVKVRLIQTTMVAI LI GLI FLGQQLTQ 456 

Qy 411 VL KGAI Q D RVGL L YQ FVGAT P YT GMLN AVN LF PVLRAVS DQE S Q DGL YQ KWQMMLAYALH 470 

I : : | : : | : : : : I : I I : I : : I I : I : 

Db 457 V GVMN I N GAI FL FLTNMT FQNVFAT I NVFT S E L P VFMREARS RL Y RC DT Y FL GKT I A 513 

Qy 471 VLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIV 530 

II: : : I : : : I : I I II I I I : : I I : 

Db 514 ELPLFLTVPLVFTAIAYPMIGLRAGVLHF FNCLALVTLV — ANVS 556 

Qy 531 NS WALL SI AG VLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEI 574 

I 1:11 I: I I I :|: | :| I :: :| :| 

Db 557 TSFGYLISCASSSTSMALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEG 616 

Qy 575 LWNEFYGL NFTCGS SNVSVTTNPMCAFTQGIQFI EKTCP — GATSRFTMNFLILYS 629 

I : : I : : : : I I I I I I I I I : I I : 

Db 617 LLINQWADVEPGEISCTSSNT TCPSSGKVILETLNFSA— A 655 



Qy 



630 FTP ALVILGIWFKIRDHLISR 651 



: I llllll:: : I I 

Db 656 DLPLDYVGLAI L- 1 VS FRVLAYLALR 680 



RESULT 9 
B96573 

protein F12M16.17 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C;Accession: B96573 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N . A. ; Kaul, S.; White, O. 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H.; Cheuk, R.F.; Chin, CW. ; Chung, M.K.; Conn, L. ; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J. ; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B. ; Huizar, L. 
Nature 408, 816-820, 2000 

A; Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E. 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A,; Lam, B. ; Langin 
Hooper, S.; Lee, A.; Lee, J.M. ; Lenz, C.A. ; Li, J,H. ; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K. 
Rizzo, M. ; Rooney, T. ; Rowley, D.; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D. ; Yu, G. ; Fraser, CM. ; 
Venter, J.C; Davis, R.W. 

A;Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis. 

A; Reference number: A86141; MUID: 21016719; PMID: 11130712 

A;Accession: B96573 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-590 <STO> 

A;Cross-references: 66:7^005173; NID : g7769856; PIDN: AAF69534 . 1 ; GSPDB: GN00141 
C; Genetics : 
A;Gene: F12M16.17 
A; Map position: 1 

C; Superf amily : fruit fly white protein; ATP-binding cassette homology 

Query Match 17.9%; Score 597; DB 2; Length 590; 

Best Local Similarity 29.6%; Pred. No. 7e-37; 

Matches 186; Conservative 113; Mismatches 270; Indels 60; Gaps 15; 

Qy 29 PATAPEP HSLGILHASYSVSHRVRPWWDITS-CRQQWTRQILKDVSLYVESGQI 81 

I I I I : I : I I : : : : : : : I I I I I I I : I 

Db 4 PVKAPIPGGREISYRLETKNLSYRIGGNTPKFSNLCGLLSEKEEKVILKDVSCDARSAEI 63 

Qy 82 MCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSS 141 

I I I I : I I I I I I : : : I : : I I : I I I I I : : : : Mill 

Db 64 TAI AGP S GAGKTTLLEI LAGKVSH- GKVS GQVLVNGRPMDGPEYRRVSGFVPQEDALFP F 122 

Qy 142 LTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHVADRLIGNYSLGGISTGERRRV 201 

I I I : I I I I : I I I : : : I I : : : I I I I I I I II I I I I I I I I I I 

Db 123 LTVQETLTYSALLRLKTKRKDA-AAKVKRLIQELGLEHVADSRIGQGSRSGISGGERRRV 181 

Qy 2 02 SIT^QLLQDPKVMLFDEPTTGLDCMTT^QIWLLVELA-RRNRIVVLTIHQPRSELFQLF 2 60 

II : I : I I I : I I I I I : I I I : I I : I I I : : : : : : I I I I I I I : : 



Db 



182 SIGVELVHDPNVILIDEPTSGLDSASALQWTLLKDMTIKQGKTIVLTIHQPGFRILEQI 241 



Qy 261 DKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMD LTSVDTQSKEREI 316 

1:1 :|| I :: h : I: I I :: :| I ' : III III 

Db 242 DRIVLLSNGMWQNGSVYSLHQKIKFSGHQIPRRVNVLEYAIDIAGSLEPIRTQSC-REI 300 

Qy 317 ETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVT 37 6 

III Ml: :: : I I :: : I : I 

Db 301 SCYGHS KTWKSC YISAGGELHQSDSHSNSVLEEVQILGQRSC 342 

Qy 377 RNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGML 436 

: I : I I I I I I I I I I I I I I I I I : I : : 

Db 343 KNIFRTKQLFTTRALQASIAGLILGSIYLNV-GNQKKEAKVLRTG-FFAFILTFLLSSTT 400 

Qy 437 NAWLFPVLRAVSDQESQDGLYQKWQmLAYALHVLPFSWATMIFSSVCYWTLGLHPEV 496 

: : I I : : I : I : : I I I : I I : : : I : I : : I I : I I I : 

Db 401 EGLPI FLQDRRILMRETSRRAYRVLS YVLADTLI FI PFLLI I SMLFATPVYWLVGLRREL 460 

Qy 497 ARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMP 556 

MM: | : : | | : | | :: | : : | | : : | 

Db 461 DGFLYFSLVIWIVLLMSNSFVACFSALVPNFIMGTSVISGL-MGSFFLFSGYFIAKDRIP 519 

Qy 557 IPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGA 616 

: : : : I : I I I I ■ : : I I : : I I : : : 

Db 520 VYWEFMHYLSLFKYPFECLMINEY RGDVFLKQQDLKE 556 

Qy 617 TSRFTMNFLILYSFIPALVILGIWFKIR 645 

: ::: I hill Ml : I 
Db 557 SQKWS-NLGIMASFIVGYRVLGFFILWYR 584 



RESULT 10 
T31958 

hypothetical protein F02E11.1 - Caenorhabditis elegans 
C; Species: Caenorhabditis elegans 

C;Date: 29-Oct-1999 #sequence_revision 29-Oct-1999 #text_change 31-Jan-2000 
C;Accession: T31958 
R;Favello, A.; Scheet, P. 

submitted to the EMBL Data Library, July 1997 

A; Description: The sequence of C. elegans cosmid F02E11. 

A; Reference number: Z21104 

A; Accession: T31958 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A; Molecule type : DNA 
A; Residues: 1-658 <FAV> 

A;Cross-references: EMBL: AF016661 ; PIDN: AAB66050 . 1 ; GSPDB: GN00020 ; CESP:F02E11 

A; Experimental source: strain Bristol N2; clone F02E11 

C; Genetics : 

A;Gene: CESP : F02E11 . 1 

A; Map position: 2 

A;Introns: 115/3; 158/3; 214/3; 330/3; 368/2; 448/3; 525/1 

C; Superf amily : fruit fly white protein; ATP-binding cassette homology 

Query Match 17.9%; Score 595.5; DB 2; Length 658; 

Best Local Similarity 27.1%; Pred. No. l.le-36; 

Matches 165; Conservative 121; Mismatches 255; Indels 67; Gaps 11 



Qy 72 VSLWESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSY 131 

II I | : : : : : | | | : | | | | | : : : : : | | : | | | | : : : : : : : | 

Db 79 VSGVAEPGEVLALMGGSGAGKTTLMNIIJVHLDTNGVEYLGDWWGKKITKQKMRQMCA^ 138 

Qy 132 VLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHVADRLIG-NYSL 190 

III : I I I I I I I I I : : : : : | | | : : : : | : : | | | : 

Db 139 VQQVDLFCGTLTVREQLTYTAHMRMKNATVQQKMERVENVLRDMNLTDCQNTLIGIPNRM 198 

Qy 191 GGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT7\NQIWLLVELARRNRIVVLTIH 250 

III I I : : I : : I : : I III:: I I I I : I I I I : : : I I : : I I : : : : : : I 
Db 199 KGISIGEKKRLAFACEILTDPKILFCDEPTSGLDAFMASEVVRALLDLANKGKTIIVVLH 258 

Qy 251 QPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCG — YPCPEHSNPFDFYMDLTSVD 308 

I I I : I : : I I : : : I : : : I : I I : I : II I I I I I I : 

Db 259 QPSSTVFRMFHKVCFMATGKTVYHGAVDRLCPFFDKLGPDFRVPESYNPADFVMSEISI- 317 

Qy 309 TQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKDSPGVFSKL 368 

I I I I I : : : I : I I : I I I I : : I 

Db 318 — SPETEQEDVTRIEYLIHEYQNSDIGTQMLK KTRTAVDEFGGY 359 

Qy 369 G VLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRS 409 

I : I I : I I I : I : I I I : : II 

Db 360 GDDEDDGESRYNSTFGTQFEILLKRSLRTTFRDPLLLRVRFAQILATAI LVGIVNWRVE- 418 

Qy 410 NVLKG-AIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYA 468 

III II: I :: I : : I I : I : I : : I I I : 

Db 419 — LKGPT I QNLEGVMYNCARDMT FLFYFPSVNVI T S ELPVFLREHKSNI YS VEAYFLAKS 476 

Qy 469 LHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPN 528 

III : I I : :: I I I I III : : : I I : 
Db 477 LAELPQYTILPMIYGTIIYWMAGLVASVTSFLVFVFVCITLTWVAVSIAYVGACIFGDEG 536 

Qy 529 IVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCG 588 

: I : : : : : I I II I : I : : : : I : : : I : II I : : : : I 
Db 537 LWTFMPMF-VLPMLVFGGFYVNANS I PVYYQYVS FVSWFKHGFEALEANQWKEI DKI SG 595 

Qy 589 SSNVSVTTNPMCAFTQGIQFIEKTCP GAT S RFTMN FL I L YS FI P ALVI L 637 

: Ihlll II I : I I I I : I : 

Db 596 CDLI NPLNATTTGY CPASDGPGILTRRGIDTPLYANVLILFMSFFVYRII 645 

Qy 638 GIWFKIR 645 

I : I III 
Db 64 6 GLVALKIR 653 



RESULT 11 
T02567 

probable ATP-binding cassette protein T16B24.1 - Arabidopsis thaliana 

N; Alternate names: protein F12L6.1 

C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 10-Sep-1999 #sequence_revision 10-Sep-1999 #text__change 02-Mar-2001 
C;Accession: T02567; T00545; C84816 

R;Rounsley, S.D.; Kaul, S.; Lin, X.; Ketchum, K.A. ; Crosby, M.L.; Brandon, R.C. 
Sykes, S.M.; Mason, T.M.; Kerlavage, A.R.; Adams, M.D.; Somerville, C.R.; 
Venter, J.C. 

submitted to the EMBL Data Library, August 1998 

A; Description: Arabidopsis thaliana chromosome II BAC T16B24 genomic sequence. 



A; Reference number: Z14679 
A; Accession : T02567 

A; Status: translated from GB/EMBL/DDBJ 
A; Molecule type: DNA 
A; Residues: 1-740 <ROU> 

A; Cross-references: EMBL: AC004 697 ; NID : g3402671 ; PIDN: AAC2 8 975 . 1; PID:g3402672 
A; Experimental source: cultivar Columbia 

R;Rounsley, S.D.; Lin, X.; Ketchum, K.A. ; Crosby, M.L.; Brandon, R.C.; Sykes, 
S.M.; Kaul, S.; Mason, T.M.; Kerlavage, A.R.; Adams, M.D.; Somerville, C.R.; 
Venter, J.C. 

submitted to the EMBL Data Library, July 1998 

A; Description: Arabidopsis thaliana chromosome II BAC F12L6 genomic sequence. 
A; Reference number: Z14168 
A; Accession: T00545 

A; Status: translated from GB/EMBL/DDBJ 
A;Molecule type: DNA 
A; Residues: 1-362 <ROW> 

A/Cross-references: EMBL : AC004218 ; NID : g3355463 ; PIDN: AAC27 826 . 1; PID:g3355464 
A; Experimental source: cultivar Columbia 

R;Lin, X.; Kaul, S . ; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V. ; Buell, 
C.R.; Ketchum, K.A. ; Lee, J. J.; Ronning, CM. ; Koo, H.; Moffat, K.S.; Cronin, 
L.A. ; Shen, M. ; VanAken, S.E.; Umayam, L.; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A.J.; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D.; Nierman, W.C; White, O.; Eisen, J. A. ; Salzberg, S.L.; Eraser, 
CM.; Venter, J.C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana . 

A; Reference number: A84420; MUID: 20083487 ; PMID : 10617197 
A;Accession: C84816 
A; Status: preliminary 
A; Molecule type: DNA 
A; Residues: 1-740 <STO> 

A;Cross-references: GB : AE002093 ; NID: g3402672 ; PIDN :AAC2 8975 . 1 ; GSPDB : GN00139 
C; Genetics : 

A;Gene: At2g39350; T16B24.1; F12L6.1 
A;Map position: 2 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 
C; Keywords: ATP 

F; 110-310/Domain: ATP-binding cassette homology <ABC> 

Query Match 17.8%; Score 591.5; DB 1; Length 740; 

Best Local Similarity 27.5%; Pred. No. 2.5e-36; 

Matches 191; Conservative 123; Mismatches 267; Indels 113; Gaps 19; 

Qy 35 PHSLGILHASYSVSHRVRPWWD ITSCRQQWTRQILKDVSLYVESGQ 80 

II : : I : II I I I I I : I : : I : : I I : 

Db 64 PFVLSFDNLTYNVS — VRPKLDFRNLFPRRRTEDPEIAQTARPKTKTLLNNISGETRDGE 121 

Qy 81 IMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLS 14 0 

II : I i : I I I M : I I : I I : : I : : I : I I : I I |: : : I I : I I I 
Db 122 IMAVLGASGSGKSTLIDALANRIAK-GSLKGTVKLNGETLQSRMLKVI SAYVMQDDLLFP 180 

Qy 141 SLTVRETLHYTALLAI RRGNPGSFQK-KVEAVMAELSLSHVADRLIGNYSLGGISTGERR 199 

I I I I I I : I : I I I : I : I : I :: : I : : I : I I : I I I I I I I 



Db 181 MLTVEETLMFAAEFRLPRSLPKSKKKLRVQALIDQLGIRNAAKTIIGDEGHRGISGGERR 240 

Qy 200 RVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQL 259 

II I I : : II : : I I I I : I I I : I : I : I : I : I I : : : I I I I : I 
Db 241 RVSIGIDIIHDPILLFLDEPTSGLDSTSAFMWKVXKRIAQSGSIVIMSIHQPSHRVLGL 300 

Qy 260 FDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKERE 315 

I : : III : : I : I I : I I : I I I I : I : I : I I : I I 

Db 301 LDRLIFLSRGHTVYSGSPASLPRFFTEFGSPIPENENRTEFALDLI RELEGSAG 354 

Qy 316 IETSKRVQMIE SAYKKSAI CHKTLKNI ERMK 34 6 

I I : I : I : : II : : I I I 

Db 355 GTRGLIEFNKKWQEMKKQSNRQPPLTPPSSPYPNLTLKEAIAASISRGKLVSGGESVAHG 414 

Qy 347 HLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLF 402 

: I I : I : : : : | : | | | | : : I I I : I 

Db 415 GATTNTTTLAVPAFANP MWI EI KTLSKRSMLNSRRQPELFGI RIASWITG-FI LA 4 69 

Qy 403 FVL RVRS NVL KGAI QD RVGLL YQ FVGAT P YT GMLNAVNL F P VL RAVS DQ E S Q D GL YQ KWQ 462 

I I I I : I : I : I : I : | : : I : : I I : : I : I : : 

Db 470 TVFWRLDNSPKG-VQERLG-FFAFAMSTMFYTCADALPVFLQERYIFMRETAYNAYRRSS 527 

Qy 463 MMIAYALHVLPFSVVATMIFSSVCYWT^ 522 

: I : : I : I : : : I : : I I : I I : : : : I I I II 

Db 528 YVLSHAIVSFPSLI FLSVAFAATTYWAVGLDGGLTGLLFYCLI ILAS FWSGS S FVTFLSG 587 

Qy 523 IVQNPNIV NSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNE 579 

: I I : : : I I I : I I I : I I I I : I : I : II I : : I I 

Db 588 W — PSVMLGYTIWAIL — AYFLLFSGFFINRNRIPDYWIWFHYMSLVKYPYEAVLQNE 643 

Qy 58 0 FYGLN — FTCG SSNVSVTTNPMCAFTQG 605 

I II I :: I I : I I 

Db 644 FSDATKCFVRGVQIFDNTPLGELPEVMKLKLLGTVSKSLGVTISSTTCLTTGSDILRQQG 703 

Qy 606 -IQFIEKTCPGATSRFTMNFLILYSFIPALVILG 638 

: I : I I I I I I : I : : I I 

Db 704 WQLSKWNCLFITVAFGFFFRILFYF TLLLG 734 



RESULT 12 
G02068 

white homolog - human 

C; Species: Homo sapiens (man) 

C;Date: 21-Dec-1996 #sequence_revision 06-Jun-1997 #text_change 02-Feb-2001 
C; Accession: G02068 

R;Croop, J.M. ; Tiller, G. ; Fletcher, J.A. ; Lux, M. ; Raab, E . ; Goldenson, D. ; 
Arciniegas, S.; Son, D.; Wu, R. 

submitted to the EMBL Data Library, August 1995 
A; Reference number: H00769 
A; Access ion: G02068 

A; Status: preliminary; translated from GB/EMBL/DDBJ 
A; Molecule type: mRNA 
A; Residues: 1-638 <CRO> 

A; Cross-references: EMBL:U34919; NID : gl314276; PIDN : AAC51098 . 1 ; PID:gl314277 
C; Genetics: 
A; Gene: white 

C; Superf amily : fruit fly white protein; ATP-binding cassette homology 



C; Keywords: ATP; nucleotide binding; P-loop 

F; 61-253/Domain: ATP-binding cassette homology <ABC> 

F; 78-85/Region: nucleotide-binding motif A (P-loop) 

Query Match 17.8%; Score 590.5; DB 2; Length 638; 

Best Local Similarity 26.3%; Pred. No. 2.4e-36; 

Matches 164; Conservative 142; Mismatches 266; Indels 51; Gaps 14 

Qy 44 SYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRL 103 

I I I I III I : : : : I I : I I I : : : I : I I I : I I : I I : : : : I 

Db 43 SYSVPE — GPWW RKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGY- 94 

Qy 104 GRAGTFLGEvTVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGS 163 

I I I : I I I : I : : I I I I I I I : I : : I I : : : I 

Db 95 -RETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGR 153 

Qy 164 FQKKVEAW1AELSLSHVADRLIGNYSLGGISTGERRRVS IAAQLLQDPKVMLFDEPTTGL 223 

: : I : : : II I I I : I I : I : I : : I I : I : : I I I I I I I I : I I 

Db 154 -REMVKEILTALGLLSCA NTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGL 207 

Qy 224 DCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDF 283 

I : I : I I : I I : I : : I I II I : : I I : I I I : : : I I I : : : I : : : 

Db 208 DSAS CFQWS LMKGLAQGGRS 1 1 CT I HQP S AKLFELFDQL YVXSQGQCVYRGKVCNLVP Y 267 

Qy 284 FNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA 1 334 

I I I I : I I I I I : : I : : : : I I : I : I : : 
Db 268 LRDLGLNCPTYHNPADFVTyiEVASGEYGDQNSRLvT^AVREGMCDSDHKRDLGGDAEVNPFL 327 

Qy 335 CHKTLKNIERMKHLKTLPMVPFKTKDSPGV FSKLGVLLRRVTRNLVRNKL 384 

I : : : : : I I I I III: : : : I : I : : : I : : 

Db 328 WHRPSEEVKQTKRLKGL RKDS S SMEGCHS FSASCLTQFCI LFKRTFLS IMRDSV 381 

Qy 385 AVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPV 444 

I : : : : I I : I : : I : I h :. : : : I I I : 

Db 382 LTHLRITSHIGIGLLIGLLYLGIGNEAKK — VLSNSGFLFFSMLFLMFAALMPTVLTFPL 439 

Qy 445 LRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSA 504 

I : I : I | | : : | | : : : : | : | | : I I : I 

Db 440 EMGVFL REH LN YW Y S L KAY YLAKTMADVP FQ I MF P VAYC S I VYWMT SQ P S DAVAFVL FAA 499 

Qy 505 ALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISY 564 

I : : I I I : I I : I : : I I : I I I : : I : : I I 

Db 500 LGTMTSLVAQSLGL-LIGAASTSLQVATFVGPVTAI PVLLFSGFFVSFDTIPTYLQWMSY 558 

Qy 565 FTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 624 

: : : I I : : : : III: : : : III : I : : : : : I 

Db 559 ISYVRYGFEGVILS-IYGLD REDLHCDI DETCHF-QKSEAI LRELDVENAKLYLDF 612 

Qy 625 LILYSFIPALVILGIW — FKIR 645 

::| I :| :: I MM 
Db 613 IVLGIFFISLRLIAYFVXRYKIR 635 



RESULT 13 
C86441 

probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 



C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 31-Mar-2001 
C; Accession: C8 6441 

R;Theologis, A,; Ecker, J.R.; Palm, C.J.; Federspiel, N.A. ; Kaul, S.; White, O. 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q. ; Chen, H. ; Cheuk, R.F.; Chin, C.W.; Chung, M.K.; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B. ; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B. ; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 

A; Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C; Khan, S.; Khaykin, E. 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A.; Lam, B.; Langin 
Hooper, S.; Lee, A.; Lee, J.M.; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A.; Luros, J.S.; Maiti, R.; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C; Osborne, B.I,; Pai, G.; Peterson, J.; Pham, P.K. 
Rizzo, M. ; Rooney, T.; Rowley, D.; Sakano, H. 

A;Authors: Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD. ; Utterback, T, ; van Aken, 
S.; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D. ; Yu, G.; Fraser, CM.; 
Venter, J.C; Davis, R.W. 

A; Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis . 

A; Reference number: A86141; MUID : 21016719 ; PMID : 11130712 

A; Accession: C86441 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-64 6 <STO> 

A; Cross-references : GB:AE005172; NID: glll36734; PIDN : AAG31315 . 1 ; GSPDB : GN00141 
C; Genetics : 
A;Map position: 1 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 17.7%; Score 589.5; DB 2; Length 64 6; 

Best Local Similarity 29.9%; Pred. No. 2.9e-36; 

Matches 183; Conservative 112; Mismatches 2 46; Indels 71; Gaps 22; 

Qy 10 GGSM— GLQVNRGSQS-SLEGAPATAPEPHSLGILHASYSVS HRVRPWWDITSCR 61 

I I I I I : I I I : I : I I : : I :: :: I : : | 

Db 14 GGVMVQGLPDMSDTQSKSVLAFPTITSQP GLQMSMYPITLKEWYKVK-IEQTSQCM 69 

Qy 62 QQW TRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGR 118 

I : I I : : I I : : : I I I I I I I I I I I I : I I I : I I I : I II 

Db 70 GSWKSKEKTILNGITGMVCPGEFLAMLGPSGSGKTTLLSALGGRLSK — TFSGKVMYNG- 126 

Qy 119 ALRREQFQDCF SYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQK KVE 169 

: I I : I I I I 111111:1111: II : I : 

Db 127 QPFSGCIKRRTGFVAQDDVLYPHLTVWETLFFTALLRL PSSLTRDEKAEHVD 17 8 

Qy 170 AVmELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTAN 229 

I : I I I I : : : I I I I I I I : : I I I I : : I : I : : I II I I : I I I I I : 

Db 179 RVIAELGLNRCTNSMIGGPLFRGISGGEKKRVSIGQEMLINPSLLLLDEPTSGLDSTTAH 238 

Qy 230 QIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGY 289 

: I I : II I I I I I I I I I ::: I I I :: I I I hi : : : : I : I : 
Db 239 RIWTIKRLASGGRTVA/TTIHQPSSRIYHMFDKVVLLSEGSPIYYGAASSAVEYFSSLGF 298 

Qy 290 PCPEHSNPFDFYMDLTS VDTQSKEREIETSKRVQMIESAYK KSAICHKTL 339 

III : I I : III: II : : I I I : I : : I : 



Dfo 



299 STSLTVNPADLLLDLANGI PPDTQKETSEQEQKTVKETLVSAYEKNI STKLKAELCNAES 358 



Qy 340 KNIERMK-HLKTLPMVP FKTKDSPGVFSKLGVLLRRVTRNL VRNKLAVTTRLLQNLI 395 

: I I II : I : : I I I : I I III: : : 

Db 359 H S YE YT KAAAKN LKS EQWCTT WWYQFTVLLQRGVRERRFES FNKLRI FQVI SVAFL 414 

Qy 396 MGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQD 455 

I II:: : I I I I I I I : I : : I I I II : : : I 

Db 415 GG — LLWW HTPKSHIQDRTALLFFFSVFWGFYPLYNAVFTFPQEKRMLIKERSS 466 

Qy 456 GLYQKWQMMIAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEF 515 

I : I : : | : | | : I : II II |: I : I I : : 

Db 467 GMYRLSSYFMARNVGDLPLELALPTAFVFIIYWMGGLKPDPTTFILSLLVVLYSVLVAQG 526 

Qy 516 LTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKI-ISYFTFQKYCSEI 574 

II : : I :::::: : : | : : | : : | | | : : | : : | | : : 

Db 527 LGLAFGAIiLMNI KQATTLASVTTLVFLIAGGYY VQQIP-PFIVWLKYLSYSYYCYKL 582 

Qy 575 LWNEFYGLNFT 586 

I : I : : I 

Db 583 LL GIQYT 589 



RESULT 14 
G84791 

probable ABC transporter [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Feb-2001 #sequence__revision 02-Feb-2001 #text_change 16-Feb-2001 
C; Accession: G847 91 

R;Lin, X.; Kaul, S.; Rounsley, S.D.; Shea, T.P.; Benito, M.I.; Town, CD.; 
Fujii, C.Y.; Mason, T.M.; Bowman, C.L.; Barnstead, M.E.; Feldblyum, T.V. ; Buell, 
CR.; Ketchum, K.A. ; Lee, J. J.; Ronning, CM.; Koo, H. ; Moffat, K.S.; Cronin, 
L.A. ; Shen, M. ; VanAken, S.E.; Umayam, L. ; Tallon, L.J.; Gill, J.E.; Adams, 
M.D.; Carrera, A. J.; Creasy, T.H.; Goodman, H.M. ; Somerville, C.R.; Copenhaver, 
G.P.; Preuss, D. ; Nierman, W.C; White, O. ; Eisen, J. A.; Salzberg, S.L.; Fraser, 
CM. ; Venter, J. C 
Nature 402, 761-768, 1999 

A; Title: Sequence and analysis of chromosome 2 of the plant Arabidopsis 
thaliana . 

A;Reference number: A84420; MUID: 20083487 ; PMID : 10617197 
A; Accession: G84791 
A; Status: preliminary 
A;Molecule type: DNA 
A;Residues: 1-755 <STO> 

A;Cross-references: GB:AE002093; NID : g4056489; PIDN : AAC98055 . 1 ; GSPDB: GN00139 

C; Genetics : 

A; Gene: At2g37360 

A;Map position: 2 

C; Super family: Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 17.6%; Score 584; DB 2; Length 755; 

Best Local Similarity 27.2%; Pred. No. 9.4e-36; 

Matches 173; Conservative 128; Mismatches 253; Indels 82; Gaps 19; 

Qy 21 SQSSLEGAPAT — APEPHSLGILHASYSVSHRVRPWWDITSCRQQW TRQILKDV 72 

I : I | | | : : | I : I I I : : : : | | : I : : I : 



Db 



79 SFNSWASAPASSISSSPFVLSFTDLTYSVKIQ-KKFNPLACCRRSGNDSSVNTKILLNGI 137 



Qy 73 SLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYV 132 

I I : : I : I I : I I I I I : I I : I I : : I : : : I : : I I I : : I I 

Db 138 SGE1AREGEMMAVLGASGSGKSTLIDALANRIAK-DSLRGSITLNGEVLESSMQKVISAW 196 

Qy 133 LQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKK VEAVMAELSLSHVADRLIGN 187 

: I I I I I I I I I :: I : II II I : I : : : I I I : I I : 

Db 197 MQDDLLFPMLTVEETLMFSAEFRL PRSLSKKKKKARVQALIDQLGLRSAAKTVIGD 252 

Qy 188 YSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT7\NQIVVLLVELARRNRIVVL 247 

1:111111111 : : I I : : I I I I : I II : I : : : I : I : I I : : 
Db 253 EGHRGVSGGERRRVSIGNDI IHDPI ILFLDEPTSGLDSTSAYMVI KVLQRIAQSGS1VIM 312 

Qy 248 TIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSV 307 

: I I I I : II:: III : : I : I : II:: : I I I : I : I : I I 
Db 313 SIHQPSYRIMGLLDQLIFLSKGNTVYSGSPTHLPQFFSEFKHPIPENENKTEFALDLI — 370 

Qy 308 DTQS KEREI ET S KRVQMI E SAYKKSAICHKTLKNIERMKHLKTLP 352 

: I I I : : I : I : : I I : : I I 
Db 371 RELEYSTEGTKP L VE FH KQWRAKQAP S YNNN NKRNTNVSSLKEAITASISRGK 423 

Qy 353 MVP-FKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLF 399 

: I I : I : I : : : I : : I I I : II : : I : 

Db 424 LVSGATNNNS SNLTPS FQTFANP- FWI EMI VI GKRAI LNSRRQPELLGMRLGAVMVTGI I 482 

Qy 400 LLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQ 459 

I : I I I I I : I : I : I : I : I : : I I : : I : I : 

Db 483 LATMFTNL-DNSPKGA-QERLG-FFAFAMSTTFYTCAEAI PVFLQERYI FMRETAYNAYR 539 

Qy 460 KWQMMLAYALHVLPFSWATMIFSSVCW 519 

: : I : : : : I : I : I : : : I : I I I :| :|| I 

Db 540 RS S YVL S Q S 1 1 S I PAL I VL S AS FAATT FWAVG LD GGANG F FF F Y FT I LAS FWAG S S FVT F 599 

Qy 520 LLGIVQNPNIV NSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILV 576 

II:: II:: I I I : I I hill : : I : : I : II I : : 

Db 600 LSGVI — PNVMLGFTVWAIL — AYFLLFSGFFI SRDRIPVYWLWFHYI SLVKYPYEGVL 655 

Qy 577 VNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKT 612 

II I II | : | : | : : 

Db 656 QNEF QN PT RC FARGVQL FDN S 676 



RESULT 15 
D96553 

hypothetical protein F5D21.6 [imported] - Arabidopsis thaliana 
C; Species: Arabidopsis thaliana (mouse-ear cress) 

C;Date: 02-Mar-2001 #sequence_revision 02-Mar-2001 #text_change 23-Mar-2001 
C;Accession: D96553 

R;Theologis, A.; Ecker, J.R.; Palm, C.J.; Federspiel, N.A.; Kaul, S.; White, O. 
Alonso, J.; Altaf, H.; Araujo, R. ; Bowman, C.L.; Brooks, S.Y.; Buehler, E.; 
Chan, A.; Chao, Q.; Chen, H. ; Cheuk, R.F.; Chin, C.W. ; Chung, M.K.; Conn, L.; 
Conway, A.B.; Conway, A.R.; Creasy, T.H.; Dewar, K. ; Dunn, P.; Etgu, P.; 
Feldblyum, T.V.; Feng, J.; Fong, B.; Fujii, C.Y.; Gill, J.E.; Goldsmith, A.D.; 
Haas, B.; Hansen, N.F.; Hughes, B.; Huizar, L. 
Nature 408, 816-820, 2000 



A;Authors: Hunter, J.L.; Jenkins, J.; Johnson-Hopson, C. ; Khan, S.; Khaykin, E. 
Kim, C.J.; Koo, H.L.; Kremenetskaia, I.; Kurtz, D.B.; Kwan, A,; Lam, B. ; Langin 
Hooper, S.; Lee, A.; Lee, J.M.; Lenz, C.A. ; Li, J.H.; Li, Y. ; Lin, X.; Liu, 
S.X.; Liu, Z.A. ; Luros, J.S.; Maiti, R. ; Marziali, A.; Militscher, J.; Miranda, 
M. ; Nguyen, M. ; Nierman, W.C.; Osborne, B.I.; Pai, G. ; Peterson, J.; Pham, P.K. 
Rizzo, M. ; Rooney, T . ; Rowley, D. ; Sakano, H. 

A; Authors : Salzberg, S.L.; Schwartz, J.R.; Shinn, P.; Southwick, A.M.; Sun, H.; 
Tallon, L.J.; Tambunga, G. ; Toriumi, M.J.; Town, CD.; Utterback, T.; van Aken, 
S-; Vaysberg, M. ; Vysotskaia, V.S.; Walker, M. ; Wu, D.; Yu, G. ; Fraser, CM. ; 
Venter, J.C.; Davis, R.W. 

A;Title: Sequence and analysis of chromosome 1 of the plant Arabidopsis . 

A; Reference number: A86141; MUID: 21016719; PMID : 11130712 

A;Accession: D96553 

A; Status: preliminary 

A;Molecule type: DNA 

A; Residues: 1-687 <STO> 

A; Cross-references: GB:AE005173; NID: gl0092349; PIDN : AAG12758 . 1 ; GSPDB: GN00141 

C; Genetics : 

A; Gene: F5D21.6 

A;Map position: 1 

C; Superf amily : Arabidopsis thaliana probable ATP-binding cassette protein 
F12L6.1; ATP-binding cassette homology 

Query Match 17.5%; Score 583.5; DB 2; Length 687; 

Best Local Similarity 28.0%; Pred. No. 9e-36; 

Matches 167; Conservative 117; Mismatches 239; Indels 73; Gaps 16; 

Qy 33 PEPHSLGILHASYSVSHRVRPWWDITSCRQQW TRQI LKDVSLYVESGQIMCI LGS S 88 

II : I : I I I : I : I I : : I : : : I I : I I I : I I 

Db 13 PPPAEIG— RGAYLA WEDLTWIPNFSGGPTRRLLDGLNGHAEPGRIMAIMGPS 64 

Qy 89 GSGKTTLLDAMSGRLGRAGTFLGEVYWGRALRREQFQDCFSYVXQSDTLLSSLTVRETL 14 8 

I I I I : I I I I ::: I I I I I : : I I : I : : I I I I I : : I I I I I I : 

Db 65 GSGKSTLLDSLAGRLARNVIMTGNLLLNGKKARLD — YGLVAYVTQEDILMGTLTVRETI 122 

Qy 14 9 H YT ALLAI RRGNP G S FQKK VEAVMAELSLSHVADRLI GNYSLGGI STGERRRVS I 203 

I : I I : I : II : I I I I I I : I I I : I : I I I I : I I I : 

Db 123 TYSAHLRL SSDLTKEEVNDIVEGTIIELGLQDCADRVI GNWHSRGVSGGERKRVSV 178 

Qy 204 AAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR-RNRIVVLTIHQPRSELFQLFDK 262 

I : : I I : : : I I I I : I I I : I : : I : I I I I I : I I I I I I : I I I I 
Db 179 ALEILTRPQILFLDEPTSGLDSASAFFVIQALRNIARDGGRTWSSIHQPSSEVFALFDD 238 

Qy 263 IAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRV 322 

: : I I I I : : I : : I I : I : I I I : III:: : I : : : I : I : 

Db 239 LFLLSSGETVYFGESKFAVEFFAEAGFPCPKKRNPSDHFLRCINSDFDTVTATLKGSQRI 298 

Qy 323 QMI ESAYKKSAICHKTLKNI ERMKHLKTLPMVPFKTKDS P 362 

: : : I : I : : I | : : : : : 

Db 299 RETPATSDPLMNLATSEIKARLVEN-YRRSVYAKSAKSRIRELASIEGHHGMEVRKGSEA 357 

Qy 363 GVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRVGL 422 

I : I I : I I : I : : I : I : : : II II 

Db 358 TWFKQLRTLTKRSFVNMCRD IGYYWSRIVIYIWSFCVGTIFYDVGH 404 

Qy 423 LYQFVGATP YTGMLNAVNL — FPVL RAVSDQESQDGLYQKWQMMLAYALHVL 472 

1:1 | | : : : : | | I : I II : : : : 



Db 405 S YTS I LARVSCGGFITGFMTFMS I GGFPS FI EEMKVFYKERLSGYYGVSVYI I SNYVSS F 464 

Qy 473 PFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNS 532 

III : I hi : | | : : : | : : I I : I : : I I I : 

Db 465 PFLVAIALITGSITYNMVKFRPGVSHWAFFCLNIFFSVSVIESLMMVVASLV — PNFLMG 522 

Qy 533 WALLSIAG-VLVGSGFLRNIQEMPIPF — KI I SYFTFQKYCSEILWNEFYGLNF 585 

: : I I : : : I I I I : : : I I II::: : : I : I I I I 

Db 523 LITGAGIIGIIMMTSGFFRLLPDLPKVFWRYPISFMSYGSWAIQGAYKNDFLGLEF 578 



Search completed: February 27, 2004, 07:18:55 
Job time : 16.5049 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: February 27, 2004, 07:17:39 ; Search time 29.2557 Seconds 

(without alignments) 
4698.604 Million cell updates/se 

Title: US-09-98 9-981A-6 

Perfect score: 3326 

Sequence: 1 MGDLS S LT PGGSMGLQVNRG PAL VI LGI WFKI RDHLI S R 651 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 



809742 seqs, 211153259 residues 



Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



809742 



Database 



Published_Applications_AA: * 

1: /cgn2_6/ptodata/2/pubpaa/US07_PUBCOMB.pep:* 

2: /cgn2_6/ptodata/2/pubpaa/PCT_NEW_PUB.pep:* 

3: /cgn2_6/ptodata/2/pubpaa/US06_NEW_PUB.pep: * 

4: /cgn2_6/ptodata/2/pubpaa/US06_PUBCOMB.pep: * 

5 : /cgn2_6/ptodata/2/pubpaa/US07_NEW_PUB . pep : * 

6: /cgn2_6/ptodata/2/pubpaa/PCTUS_PUBC0MB.pep: * 

7: /cgn2_6/ptodata/2/pubpaa/US08_NEW_PUB.pep:* 

8 : /cgn2_6/ptodata/2/pubpaa/US08_PUBCOMB. pep : * 

9: /cgn2_6/ptodata/2/pubpaa/US09A_PUBCOMB.pep: * 
10: /cgn2_6/ptodata/2/pubpaa/US09B_PUBCOMB.pep: * 
11: /cgn2_6/ptodata/2/pubpaa/US09C_PUBCOMB.pep: * 
12 : /cgn2_6/ptodata/2/pubpaa/US09_NEW_PUB.pep: * 
13: /cgn2_6/ptodata/2/pubpaa/US10A_PUBCOMB.pep: * 
14 : /cgn2_6/ptodata/2/pubpaa/US10B_PUBCOMB.pep : * 
15 : /cgn2_6/ptodata/2/pubpaa/US10C_PUBCOMB . pep : * 
16: /cgn2_6/ptodata/2/pubpaa/US10_NEW_PUB.pep: * 
17 : /cgn2_6/ptodata/2/pubpaa/US60_NEW_PUB.pep: * 
18 : /cgn2_6/ptodata/2/pubpaa/US60_PUBCOMB.pep: * 

Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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09- 


866- 


866A-27 


Sequence 


27, Appl 


18 


660 


19. 


8 


657 


9 


us- 


09- 


866- 


866A-14 


Sequence 


14, Appl 


19 


627 


18. 


9 


1095 


15 


us 


-10 


-369 


-493-2025 


Sequence 


2025, Ap 


20 


621 


18. 


7 


1049 


15 


us 


-10 


-369 


-493-1520 


Sequence 


1520, Ap 


21 


602.5 


18. 


1 


663 


13 


us 


-10 


-108 


-605-245 


Sequence 


245, App 


22 


598.5 


18. 


0 


674 


14 


us 


-10 


-090 


-455-4 


Sequence 


4, Appli 


23 


598.5 


18. 


0 


674 


16 


us 


-10 


-429 


-160-10 


Sequence 


10, Appl 


24 


595.5 


17. 


9 


658 


15 


us 


-10 


-369 


-493-5347 


Sequence 


5347, Ap 


25 


590.5 


17. 


8 


638 


13 


us 


-10 


-072 


-621-10 


Sequence 


10, Appl 


26 


585.5 


17. 


6 


646 


13 


us 


-10 


-072 


-621-9 


Sequence 


9, Appli 


27 


585.5 


17. 


6 


646 


14 


us 


-10 


-090 


-455-2 


Sequence 


2, Appli 


28 


578.5 


17. 


4 


627 


14 


us 


-10 


-090 


-455-8 


Sequence 


8, Appli 


29 


578 


17. 


4 


604 


9 


us- 


09- 


745- 


763-197 


Sequence 


197, App 


30 


571.5 


17. 


2 


646 


14 


us 


-10 


-079 


-087-2 


Sequence 


2, Appli 


31 


570.5 


17. 


2 


646 


13 


us 


-10 


-154 


-452-4 


Sequence 


4, Appli 


32 


567.5 


17. 


1 


646 


14 


us 


-10 


-090 


-455-13 


Sequence 


13, Appl 


33 


565.5 


17. 


0 


599 


15 


us 


-10 


-210 


-130-14 


Sequence 


14, Appl 


34 


562.5 


16. 


9 


646 


13 


us 


-10 


-154 


-452-8 


Sequence 


8, Appli 


35 


554 


16. 


7 


559 


15 


us 


-10 


-369 


-493-5740 


Sequence 


5740, Ap 


36 


545.5 


16. 


4 


608 


15 


us 


-10 


-369 


-493-5748 


Sequence 


5748, Ap 


37 


540.5 


16. 


3 


676 


15 


us 


-10 


-369 


-493-3799 


Sequence 


3799, Ap 


38 


517.5 


15. 


6 


610 


15 


us 


-10 


-369 


-493-5687 


Sequence 


5687, Ap 


39 


517.5 


15. 


6 


639 


15 


us 


-10 


-369 


-493-6184 


Sequence 


6184, Ap 


40 


504 


15. 


2 


695 


15 


us 


-10 


-369 


-493-6199 


Sequence 


6199, Ap 


41 


496.5 


14. 


9 


560 


15 


us 


-10 


-369 


-493-12899 


Sequence 


12899, A 


42 


496 


14. 


9 


551 


15 


us 


-10 


-369 


-493-3562 


Sequence 


3562, Ap 


43 


485 


14. 


6 


545 


14 


us 


-10 


-083 


-357-1335 


Sequence 


1335, Ap 


44 


430.5 


12. 


9 


1395 


15 


us 


-10 


-369 


-493-4054 


Sequence 


4054, Ap 


45 


425.5 


12. 


8 


615 


10 


us 


-09 


-949 


-029-24 


Sequence 


24, Appl 



ALIGNMENTS 



RESULT 1 
US-09-837-992-3 

; Sequence 3, Application US/09837992 
; Patent No. US20020081687A1 
; GENERAL INFORMATION: 
; APPLICANT: Tian, Hui 



APPLICANT: Schultz, Joshua 
APPLICANT: Shan, Bei 
APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG) : Compositions 
TITLE OF INVENTION: and Methods of Use 
FILE REFERENCE: 018781-006020US 
CURRENT APPLICATION NUMBER: US/09/837 , 992 
CURRENT FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: US 60/198,465 
PRIOR FILING DATE: 2000-04-18 
PRIOR APPLICATION NUMBER: US 60/204,234 
PRIOR FILING DATE: 2000-05-15 
NUMBER OF SEQ ID NOS : 45 
SOFTWARE: Patent In Ver. 2.1 
SEQ ID NO 3 
LENGTH: 651 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: amino acid sequence 
US-09-837-992-3 

Query Match 100.0%; Score 3326; DB 9; Length 651; 

Best Local Similarity 100.0%; Pred. No. l.le-309; 

Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I II I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I' I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II I I I I I I 
Db 61 RQQWTRQI LKDVS L YVES GQIMCI LGS S GS GKTTLLDAMS GRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSWLQSDTLLSSLTWETLHYTALIAIRRGNPGSFQKKVEAVMAELSLSHV 180 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTT^LLAIRRGNPGSFQKKVEAVMAELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRWRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I II I I I I I I II I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 



Qy 



421 



GLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 
I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 



480 



Db 



421 GLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 



Qy 481 I FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNS WALLS IA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I 

Db 481 I FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI VNS WALLS I A 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 

I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I 
Db 601 AFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVILGI WFKI RDHLI SR 651 



RESULT 2 

US-09-989-981A-6 

Sequence 6, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 0187 8 1-007 32 0US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 6 
LENGTH: 651 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

OTHER INFORMATION: human ABCG5 (hABCG5) 
US-09-989-981A-6 

Query Match 100.0%; Score 3326; DB 10; Length 651; 

Best Local Similarity 100.0%; Pred. No. l.le-309; 

Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 12 0 

Qy 121 RREQFQDCFS YVLQS DTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSHV 180 



I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I I I I I I I I I I I I I I I I I 

Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 18 0 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVIjLVELAR 240 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREI ETS KRVQMI ESAYKKSAI CHKTLKNI ERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 301 YMDLTSVDTQSKEREIETSKRVQMI ESAYKKSAI CHKTLKNI ERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I 

Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

Qy 421 GLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I 
Db 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 4 80 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSW7VLLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I 

Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 

I I I I I I- 1 I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 AFTQGIQFI EKTCPGATSRFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 



RESULT 3 
US-10-090-455-6 

; Sequence 6, Application US/10090455 

; Publication No. US20030027259A1 

; GENERAL INFORMATION: 

; APPLICANT: Chen, Hongyun 

; APPLICANT: Le Bihan, Stephane 

; TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
; FILE REFERENCE: 100103.406 

; CURRENT APPLICATION NUMBER: US/10/090, 455 
; CURRENT FILING DATE: 2002-03-01 
; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: FastSEQ for Windows Version 4.0 
; SEQ ID NO 6 
; LENGTH: 651 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-090-455-6 



Query Match 100.0%; Score 3326; DB 14; Length 651; 

Best Local Similarity 100.0%; Pred. No. l.le-309; 



Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLWESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVTWGRAL 12 0 

I II I I II I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I II II I I I I I I I I I I I I I I I I 

Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

Qy 121 RREQFQDCFSWLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQKKVEAVMAELSLSHV 180 

II I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I 

Db 121 RREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSHV 18 0 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVXLRRVTRNLVRNKLA^ 420 

I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 361 SPGVFSKLGVTLRRWRNLVRNKLAVITRLLQNLIMGLFLLFFVXRWSNVLKGAIQDRV 42 0 

Qy 421 GLLYQFVGATPYTGMLNAWLFPVT^RAVSDQESQDGLYQKWQMMLAYALHVLPFSVVATM 48 0 

I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 GLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

I II I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II 
Db 481 IFSSVCYWTLGLHPEVARFGYFS7UVLLAPHLIGEFLTLVLLGIVQNPNIVNSVVALLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILVWEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILVVNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



RESULT 4 
US-09-837-992-1 

; Sequence 1, Application US/09837992 

; Patent No. US20020081687A1 

; GENERAL INFORMATION: 

; APPLICANT: Tian, Hui 

; APPLICANT: Schultz, Joshua 

; APPLICANT: Shan, Bei 

; APPLICANT: Tularik Inc. 

TITLE OF INVENTION: Sitosterolemia Susceptibility Gene (SSG): Compositions 



TITLE OF INVENTION: and Methods of Use 
FILE REFERENCE: 0 187 8 1-00602 OUS 
CURRENT APPLICATION NUMBER: US/09/837 , 992 
CURRENT FILING DATE: 2001-04-18 
PRIOR APPLICATION NUMBER: US 60/198,465 
PRIOR FILING DATE: 2000-04-18 
PRIOR APPLICATION NUMBER: US 60/204,234 
PRIOR FILING DATE: 2000-05-15 
NUMBER OF SEQ ID NOS : 45 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 1 
LENGTH: 652 
TYPE : PRT 

ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse sitosterolemia susceptibility gene (SSG) 
OTHER INFORMATION: amino acid sequence 
US-09-837-992-1 

Query Match 82.5%; Score 2744.5; DB 9; Length 652; 

Best Local Similarity 80.2%; Pred. No. 6.6e-254; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVTVNGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 120 LRREQFQDCFSWLQSDTLLSSLTWETLHYTALLAIRRGNPGSFQKKVELPlVMAELSLSH 179 

I I I : I I I I I I I I I I I I I I I I I I I I I I I llhll: I : : I I I I I I I I I I I I I 
Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

Qy 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

I I I : : I I : I : I I I I : I I I I I I I I I I II I I I I I I I : I I I I I I I I I I I I I I I I : I I III 
Db 181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 24 0 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I I I I I I I I I I I : I I : I : I I II I : I I I I : : I I I I I I I I I I II 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I I I : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I I I : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 420 VGLLYQFVGATPYTGMLNAVNLFPVXRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 479 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I 
Db 421 VGLLYQLVGAT P YTGMLNAVNL FPMLRAVS DQESQDGLYHKWQMLLAYVLHVLP FS VI AT 480 



Qy 



480 



MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 
: I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I : I I I I I I 



Db 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 



Qy 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I : : I I 
Db 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVVNEFYGLNFTCGGSNTSMLNHPM 600 

Qy 600 CAFTQGIQFI EKTCPGATS RFTMNFLI LYS FI PALVI LGI WFKI RDHLI SR 651 

II I I I : I I I I I I I I I I I II I I II I I I I I I I I I I I I I I I : I I : I I : I I I I 
Db 601 CAITQGVQFIEKTCP GAT SRFTANFLILYGFI PALVI LGI VI FKVRDYLISR 652 



RESULT 5 

US-09-989-981A-2 

Sequence 2, Application US/09989981A 
Publication No. US2003004 9730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 01878 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 2 
LENGTH: 652 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse ABCG5 (mABCGS) 
US-09-989-981A-2 

Query Match 82.5%; Score 2744.5; DB 10; Length 652; 

Best Local Similarity 80.2%; Pred. No. 6.6e-254; 

Matches 523; Conservative 64; Mismatches 64; Indels 1; Gaps 1; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 61 CQQKWDRQI LKDVSLYI ESGQIMCI LGS SGSGKTTLLDAI SGRLRRTGTLEGEVFVNGCE 120 

Qy 120 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 17 9 

I I I : I I I I I I I I I II I I I I I I I I I I I I llhll: I : : I I I I I I I I I I I I I 
Db 121 LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

Qy 18 0 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 



Db 



I I I :: I I : I : I I I I : I I I i I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 
181 VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 



Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I :: I I II I I I I I I I I M I I I I : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I I I II I I I I I I : I I : I : I I II I : I II I : : I I I I I I I I I I I I 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I II : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I I I : I I I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 42 0 VGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSVVAT 47 9 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I : I I 

Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

Qy 48 0 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 

: I I I I I I I I I I I I : I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 
Db 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSIVALLSI 540 

Qy 54 0 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I : : I I 
Db 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILVVNEFYGLNFTCGGSNTSMLNHPM 600 

Qy 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I : I I : I I : I I I I 

Db 601 CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 



RESULT 6 

US-10-104-047-2795 

; Sequence 2795, Application US/10104047 
; Publication No. US20030236392A1 
; GENERAL INFORMATION: 

; APPLICANT: HELIX RESEARCH INSTITUTE 

; TITLE OF INVENTION: No. US20030236392Alel full length cDNA 
; FILE REFERENCE: H1-A0105 

; CURRENT APPLICATION NUMBER: US/10/104, 047 

; CURRENT FILING DATE: 2002-03-25 

; PRIOR APPLICATION NUMBER: 

; PRIOR FILING DATE: 

; NUMBER OF SEQ ID NOS : 4096 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 2795 

LENGTH: 256 

TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-104-047-2795 

Query Match 39.3%; Score 1308; DB 15; Length 256; 

Best Local Similarity 100.0%; Pred. No. 9.2e-117; 

Matches 256; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 



Qy 396 MGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQD 455 



1 1 1 1 1 II 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 

Db 1 MGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQD 60 

Qy 456 GLYQKWQMMIAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFST^LAPHLIGEF 515 

I I I I I II I I I M I I I I I I I I M I I I I I I I M I I II I I I I I I I I I I I I I I I I I I I M II I I 

Db 61 GLYQKWQl^LAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEF 120 

Qy 516 LTLVLLGIVQNPNIWSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEIL 575 

I I I I I I I I I I I I I I I I M I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I II 
Db 121 LTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEIL 180 

Qy 576 VWEFYGLNFTCGSSNVSWTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALV 635 

I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I 
Db 181 WNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALV 240 

Qy 636 ILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I 
Db 241 ILGIWFKIRDHLISR 256 



RESULT 7 

US-09-989-981A-4 

Sequence 4, Application US/09989981A 
Publication No. US20030049730A1 
GENERAL INFORMATION: 
APPLICANT: Hobbs, Helen H. 
APPLICANT: Shan, Bei 
APPLICANT: Barnes, Robert 
APPLICANT: Tian, Hui 
APPLICANT: Tularik Inc. 

APPLICANT: Board of Regents, The University of Texas System 
TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 
FILE REFERENCE: 018781-00732 OUS 
CURRENT APPLICATION NUMBER: US/ 09/ 98 9 , 98 1A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 4 
LENGTH: 672 
TYPE: PRT 

ORGANISM: Mus mus cuius 
FEATURE : 

OTHER INFORMATION: mouse ABCG8 (mABCG8) 
US-09-989-981A-4 



Query Match 21.0%; Score 697; DB 10; Length 672; 

Best Local Similarity 29.1%; Pred. No. 1.7e-57; 

Matches 195; Conservative 129; Mismatches 263; Indels 84; Gaps 18; 

Qy 15 LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 61 

II I I I I : : : I I : : I : I I | | : : : : 

Db 17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 



Qy 62 QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 112 

I I I I I : : I : I I I I I : : I I I : : | | | | |: 

Db 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 131 

Qy 113 VYWGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAV 171 

: : : I I : : : I : : I I I I I : I I I I I I I : I : : I : I : I I I 

Db 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

Qy 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

: I I I I I : : I I : I : I I I I I I I I i III : I : : : I I I I : I I I I I : : 
Db 192 IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

Qy 232 VVXLVELARRNRIVVLTIHQPRSELFQLFDKIAILS FGELI FCGTPAEMLDFFNDCGYPC 291 

I I I I : I I : I : : : : I I I M : : I : | | | : : : : | I : I : | : : | I : I I 
Db 252 VTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPC 311 

Qy 292 PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 345 

I : I I I I I I : I I I I : I : I I I I I : I : : I : : : : : I : 

Db 312 PRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTS 371 

Qy 346 KHLKT L PMVP FKT KD S PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 397 

1:1: I : I : I I : : I : I I I I : : : : : | 

Db 372 THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

Qy 398 LFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGL 457 

I : I : | : | | | : | : : | : | : | : : | : | | | 

Db 428 LI I GFLYYGHGAKQL- - SFMDTAALLFMI GALI PFNVI LDWS KCHS ERSMLYYELEDGL 485 

Qy 458 YQKWQMMLAYALHVLPFSWATMI FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFL- 516 

I I I I I : I : : II II I I I I : : I 

Db 486 YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL— HFLLVWLV 537 

Qy 517 TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 570 

1:1 I : : I : : II : I : : I I I : I : : 

Db 538 VFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRW 597 

Qy 571 CSEILWNEFYGLNFT— CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 628 

I I : : I I : I I : I : : I I : I I I 

Db 598 CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM 1 SAMDLNSHPLY 640 

Qy 629 SFIPALVILGI 639 

: l:::|| 
Db 641 AIY — LIVIGI 64 9 



RESULT 8 

US-09-989-981A-8 

Sequence 8, Application US/09989981A 
Publication No. US2003004 9730A1 
GENERAL INFORMATION: 



APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 
APPLICANT 



Hobbs, Helen H. 
Shan, Bei 
Barnes, Robert 
Tian, Hui 
Tularik Inc. 

Board of Regents, The University of Texas System 



TITLE OF INVENTION: ABCG5 and ABCG8 : Compositions and Methods of Use 



FILE REFERENCE: 0187 8 1-007320US 
CURRENT APPLICATION NUMBER: US/09/989, 981A 
CURRENT FILING DATE: 2002-07-23 
PRIOR APPLICATION NUMBER: US 60/252,235 
PRIOR FILING DATE: 2000-11-20 
PRIOR APPLICATION NUMBER: US 60/253,645 
PRIOR FILING DATE: 2000-11-28 
NUMBER OF SEQ ID NOS : 13 
SOFTWARE: Patentln Ver. 2.1 
SEQ ID NO 8 
LENGTH: 673 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE: 

OTHER INFORMATION: human ABCG8 (hABCG8) 
US-09-989-981A-8 

Query Match 21.0%; Score 697; DB 10; Length 673; 

Best Local Similarity 28.9%; Pred. No. 1.7e-57; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR- PWWD-ITSCRQQW 64 

II : III I I | : : | : : | : : | | : I I :: : : I 

Db 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

Qy 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I | : : : : 

Db 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

Qy 116 NGRALRREQFQDCFSYVLQSDTLLSSLTWETLHYTAL^I-RRGNPGSFQKKVEAWIAE 174 

II: : : I : : I I : I I : I I I I I I I : I : : I : I : I I I : I I 

Db 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

Qy 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT7\NQIVVL 234 

II II : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I 

Db 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

Qy 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

I M : I I : I : : : : I I I I I : : I : I I I : : : : I hi I : : I INN:. 

Db 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 314 

Qy 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 348 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 

Db 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

Qy 34 9 KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 

I : : I II II : I : I I I I : : : 

Db 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

Qy 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ DRVGL L YQ FVGAT P YT GMLNAVNL F P VL R 446 

: : I : : I I : I I I I I : I : : I : : : | 

Db 422 AEACLMSMT I GFLYFG HGS I QL S FMDTAALLFMI GAL I P FNVI LDVI S KC YS ER 475 

Qy 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

I : I : I I I I I I I I : I : II I I : I 

Db 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 



Qy 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 54 9 

: I I I I : I : I : : : I I I I : 

Db 536 WLWFCCRIMALAAAALLPTFHMAS FFS NALYNSFYLAG GFM 577 

Qy 550 RNIQEMPIPFKIISYFTFQKYCSEILVWEFYGLNFTCGSSNVSVTTN 597 

I : I I : I : : I I I : : I : I : : : : 

Db 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 



RESULT 9 
US-10-090-455-7 

Sequence 7 , Application US/10090455 
Publication No. US20030027259A1 
GENERAL INFORMATION: 
APPLICANT: Chen, Hongyun 
APPLICANT: Le Bihan, Stephane 

TITLE OF INVENTION: NOVEL ABCG4 TRANSPORTER AND USES THEREOF 
FILE REFERENCE: 100103,406 

CURRENT APPLICATION NUMBER: US/10/090, 455 
CURRENT FILING DATE: 2002-03-01 
NUMBER OF SEQ ID NOS : 17 

SOFTWARE: FastSEQ for Windows Version 4.0 
SEQ ID NO 7 
LENGTH: 673 
TYPE: PRT 

ORGANISM: Homo sapiens 
US-10-090-455-7 

Query Match 21.0%; Score 697; DB 14; Length 673; 

Best Local Similarity 28.9%; Pred. No. 1.7e-57; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 8 TPGGSMGLQVNRGSQSSLEGAPAT-APEPHSLGILHASYSVSHRVR-PWWD-ITSCRQQW 64 

II till I I | : : | : : | : : | | : I I : : : : I 

Db 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

Qy 65 TRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGEVYV 115 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 76 TS PSCQNSCELGIQNLS FKVRSGQMLAI I GSSGCGRASLLDVITGR-GHGGKI KSGQIWI 134 

Qy 116 NGRALRREQFQDCFS WLQSDTLLS SLTVRETLHYTALLAI -RRGNPGS FQKKVEAVMAE 174 

II: : : I : : I I : I I : I I I I I I I : I : : I : I : I I I : I I 

Db 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

Qy 175 LSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWL 234 

II II : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I 

Db 195 LRLRQCADTRVGNMYVRGLSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVKT 254 

Qy 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

I I I : | | : | : : : : | | | | | : : | : | | | : : : : | I : I I : : I Mill: 

Db 255 LSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQHMVQYFTAIGYPCPRY 314 

Qy 295 SNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHL 348 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : I : : : I 

Db 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 



Qy 



349 



KTLPM VPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRL 390 



Db 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

Qy 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGL L YQ FVGAT P YT GMLNAW L F P VL R 44 6 

: : I : : I hill I I : I : : I : : : I 

Db 422 AEACLMSMT I GFLYFG HGS I QLS FMDTAALLFMI GALI P FNVI LDVI S KC YS ER 475 

Qy 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

I : I : I I I I I I I I : I : II I I : I 

Db 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYI 1 1 YGMPTYWLANLRPGLQPFLLHFLLV 535 

Qy 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 

: I I I I : I : I : : : I I II: 
Db 536 WLWFCCRIMALAAAALLPT FHMAS FFS NALYNSFYLAG GFM 577 

Qy 550 RNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTN 597 

|: : II : | : : | | | : : | : |::: : 

Db 578 INLSSLWTVPAWISKVSFLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 



RESULT 10 
US-09-961-086-1 

; Sequence 1, Application US/09961086 
; Publication No. US20030036645A1 
; GENERAL INFORMATION: 

; APPLICANT: UNIVERSITY OF MARYLAND, BALTIMORE 
; APPLICANT: ROSS, Douglas D. 
; APPLICANT: DOYLE, L. Austin 
; APPLICANT: ABRUZZO, Lynne 

; TITLE OF INVENTION: BREAST CANCER RESISTANCE PROTEIN (BCRP) AND THE DNA 

; TITLE OF INVENTION: WHICH ENCODES IT 

; FILE REFERENCE: EP19376-019 

; CURRENT APPLICATION NUMBER: US/09/ 961, 086 

; CURRENT FILING DATE: 2001-09-21 

; PRIOR APPLICATION NUMBER: US 60/073,763 

; PRIOR FILING DATE: 1998-02-05 

; PRIOR APPLICATION NUMBER: PCT/US99/ 02577 

; PRIOR FILING DATE: 1999-02-05 

; NUMBER OF SEQ ID NOS : 7 

; SOFTWARE: Patentln Ver. 2.1 

; SEQ ID NO 1 

LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
US-09-961-086-1 

Query Match 20.5%; Score 682.5; DB 10; Length 655; 

Best Local Similarity 29.2%; Pred. No. 4e-56; 

Matches 182; Conservative 138; Mismatches 249; Indels 55; Gaps 18; 

Qy 21 S Q S S LE GAP AT AP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I I : : I :::::: I I : | | : : : I I : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 7 8 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I : : I I I - I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVTilNG-APRPANFKCNSGYWQDDV 129 



Qy 138 LLSSLTVRETLHYTALLAIRRGNPG-SFQKKVEAVMAELSLSHVADRLIGNYSLGGISTG 196 

: : : I I I I I I : : I I : : : : hill III : I : I : I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPIWMLFDEPTTGLDCMTANQIVVLLVELARRNRIVVLTIHQPRSEL 256 

11:111 : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMIESAYKKSAICHKT LKNIERMKHLKTLPMVPF 356 

I : MM: : : : | | : : | I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I ::: : : : I I : : : I M I 

Db 370 TT S FCHQLRWVSKRS FKNLLGNPQAS IAQI IVTWXGLVIGAI YFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVLP 473 

II : I I : I : I : : : || | | | : : | | | : I I Ml 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : :| I : : I : I II I : I I : : : : : I I : : I : 

Db 481 MTMLPSI I FTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQSVVSVA 537 

Qy 534 VALLSIAGV— LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : II : : I I I I I I II I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 11 
US-10-405-806-13 

; Sequence 13, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI, HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT: KOTANI , HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION : DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234 985US0CONT 

; CURRENT APPLICATION NUMBER: US/10/405,806 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/08112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2000-30344 1 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS : 17 

; SOFTWARE: Pa tent In version 3.2 



SEQ ID NO 13 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Artificial Sequence 
FEATURE : 

OTHER INFORMATION: ABCG2 4 82Tmutant sequence 
US-10-405-806-13 

Query Match 20.5%; Score 682.5; DB 15; Length 655; 

Best Local Similarity 29.2%; Pred. No. 4e-56; 

Matches 182; Conservative 138; Mismatches 24 9; Indels 55; Gaps 18; 

Qy 21 S Q S S L E GAP AT AP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I I : : I :::::: I I : I I : : : I I : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I : : I I I : : I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 12 9 

Qy 138 LLSSLTWETLHYTALIAIRRGNPG-SFQKKVEAVMAELSLSHVADRLIGNYSLGGISTG 196 

: : : I I I I I I : : I I : : : : I : I I I III : I : I : I I 

Db 130 VMGTLTVRENLQFSA7VLRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELARRNRIVVLTIHQPRSEL 256 

11:111 : I : I I : : I II I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 24 9 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 30 9 

Qy 313 ERE IETSKR VQMIESAYKKSAICHKT LKNIERMKHLKTLPMVPF 356 

I : MM: : : : | | : : | | I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRWRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : : : : M : : : I I : I 
Db 370 TT S FCHQLRWVSKRS FKNLLGNPQAS I AQI IVTVVLGLVT GAI YFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVLP 473 

I I : I I : I : I : : : | | | | | : : | I I : I I : I I 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : : I I : : I : I I I I : I I : : : : : I I : : I : 

Db 481 MTMLPSIIFTCIVYFMLGLKPK7VDAFFVMMFTLM MVAYSAS SMALAIAAGQS WSVA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKI ISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I :: I I : : I I I I : : : M : : I I I I I I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

:: I I I : II I 

Db 595 LNATGNNPCNYA TCTG 610 



RESULT 12 
US-09-981-353-35 

Sequence 35, Application US/09981353 
Patent No. US20020160382A1 
GENERAL INFORMATION: 
APPLICANT: Lasek, Amy W. 
APPLICANT: Jones, David A. 

TITLE OF INVENTION: GENES EXPRESSED IN COLON CANCER 
FILE REFERENCE: PA- 003 8 US 

CURRENT APPLICATION NUMBER: US/09/981,353 
CURRENT FILING DATE: 2001-10-11 
NUMBER OF SEQ ID NOS : 194 
SOFTWARE: PERL Program 
SEQ ID NO 35 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapiens 
FEATURE : 

NAME/ KEY : misc_feature 

OTHER INFORMATION: Incyte ID No. US20020160382A1 5517972CD1 
US-09-981-353-35 

Query Match 20,5%; Score 680.5; DB 9; Length 655; 

Best Local Similarity 29.2%; Pred. No. 6.1e-56; 

Matches 182; Conservative 137; Mismatches 250; Indels 55; Gaps 18; 

Qy 21 SQSSLEGAPATAP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I I : : | :::::: | | : I I : : : | | : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYWGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I : : I I I : : I : I hi Ml I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVT^ETLHYTALLAIRRGNPG-SFQKKVKAVWiELSLSHVADRLIGNYSLGGISTG 196 

: : : I II I I I : : I I : : : : hill III : I : I : I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELARR 256 

11:111 : I : | | : : I I I I I I II I I I : : : I I : : : : | : : : | | I | | : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGT PAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMIESAYKKSAICHKT LKNI ERMKHLKTLPMVPF 356 

I : MM: : : : | | : : | | | : | : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I II::: : : : I I : : : I I : I 

Db 370 TT S FCHQLRWVSKRS FKNLLGNPQAS IAQI I VTWLGLVI GAI YFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAY7VL-HVLP 473 

II : I I M : I : : M I I I I : : I I I : I I Ml 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 4 80 



Qy 



474 



FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 



Db 



481 



MRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQS WSVA 537 



Qy 



534 



VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 



Db 



538 



TLLMTICFVFMMIFSGLLWLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 



Qy 



592 



VSVTTNPMCAFTQGIQFIEKTCPG 615 



Db 



595 



LNATGNNPCNYA TCTG 610 



RESULT 13 
US-10-120-687-61 

; Sequence 61, Application US/10120687 
; Publication No. US20030082155A1 
; GENERAL INFORMATION: 

APPLICANT: Massachusetts General Hospital 
; TITLE OF INVENTION: Stem Cells of the Islets of Langerhans and Their Use in 
Treating Diabetes 

; TITLE OF INVENTION: Mellitus 
; FILE REFERENCE: 3284/1235B 

; CURRENT APPLICATION NUMBER: US/10/120,687 

; CURRENT FILING DATE: 2002-04-11 

; PRIOR APPLICATION NUMBER: US60/169082 

; PRIOR FILING DATE: 1999-12-06 

; PRIOR APPLICATION NUMBER: US 09/963,875 

; PRIOR FILING DATE: 2001-09-25 

; PRIOR APPLICATION NUMBER: US 60/215109 

; PRIOR FILING DATE: 2000-06-28 

; PRIOR APPLICATION NUMBER: US 60/238880 

; PRIOR FILING DATE: 2000-10-06 

; PRIOR APPLICATION NUMBER: US 09/731261 

; PRIOR FILING DATE: 2000-12-06 

; NUMBER OF SEQ ID NOS : 61 

SOFTWARE: Patentln version 3.1 
; SEQ ID NO 61 
; LENGTH: 655 
; TYPE: PRT 

; ORGANISM: Homo sapiens 
US-10-120-687-61 

Query Match 20.5%; Score 680.5; DB 14; Length 655; 

Best Local Similarity 29.2%; Pred. No. 6.1e-56; 

Matches 182; Conservative 137; Mismatches 250; Indels 55; Gaps 18; 

Qy 21 SQS S LEGAPATAP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I I : : I :::::: I I : I I : :: I I : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYWGRALRREQFQDCFSYVLQSDT 137 

I : II I : I I I : : I I I : : I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 



Qy 



138 LLSSLTVRETLHYTALLAI RRGNPG-SFQKKVEAVMAELSLSHVADRLIGNYSLGGISTG 196 



•* 'Mill I • • I I * * • 1*111 III • I I • I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCl^ANQIWLLVELARRNRIVVLTIHQPRSEL 256 

11:111 : I : | | : : II I I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILS FGELI FCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMIESAYKKSAICHKT LKNIERMKHLKTLPMVPF 356 

I : II II: : : : I I : : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : : : : I I : : : I I : I 
Db 370 TT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKNDST 421 

Qy 415 AI Q D RVGLL YQ FVGAT P YT GMLN AVN L F P VL RAVS DQE S QD GL YQKWQMMLAYAL - H VL P 473 

I I : I I : I : I : :: II I I I : : I I I : I I : I I 

Db 422 GIQNRAGVLF- FLTTNQCFS SVSAVELFWEKKLFIHEYI SGYYRVS S YFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : I I : : I : I I I I : I I : : : : : I I : : I : 

Db 481 MRMLPSIIFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAIAAGQS WSVA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPI PFKI I SYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : I I : : I I I I I I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

:: I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 14 
US-10-405-806-2 

; Sequence 2, Application US/10405806 

; Publication No. US20030232362A1 

; GENERAL INFORMATION: 

; APPLICANT: KOMATANI , HIDEYA 

; APPLICANT: HARA, YOSHIKAZU 

; APPLICANT: KOTANI, HIDEHITO 

; APPLICANT: NAKAGAWA, RINAKO 

; TITLE OF INVENTION: DRUG RESISTANT GENE AND USE THEREOF 

; FILE REFERENCE: 234 985US0CONT 

; CURRENT APPLICATION NUMBER: US/ 10/ 4 05 , 8 06 

; CURRENT FILING DATE: 2003-04-03 

; PRIOR APPLICATION NUMBER: PCT/ JP01/08 112 

; PRIOR FILING DATE: 2001-09-18 

; PRIOR APPLICATION NUMBER: JP2000-30344 1 

; PRIOR FILING DATE: 2000-10-03 

; NUMBER OF SEQ ID NOS : 17 

SOFTWARE: Patentln version 3.2 
; SEQ ID NO 2 

LENGTH: 655 



; TYPE: PRT 

; ORGANISM: Homo sapiens 
US-10-405-806-2 



Query Match 20.5%; Score 680.5; DB 15; Length 655; 

Best Local Similarity 29.2%; Pred. No. 6.1e-56; 

Matches 182; Conservative 137; Mismatches 250; Indels 55; Gaps 18; 

Qy 21 S Q S S L E GAP AT AP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I I : : I :::::: I I : I I : : : I I : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLS FHNI CYRVKLKSGFLPCRKPVEKEI LSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : | | | : | | | : : | | | : : | : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVRETLHYTALLAI RRGNPG-SFQKKVEAVMAELSLSHVADRLIGNYSLGGISTG 196 

: : : I I I I I I : : I I : : : : I : I I I III : I : h! I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVIjLV^L 256 

I I : I I I : I : I I : : I I I I I I I I I II : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I Ml : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMI ESAYKKSAI CHKT LKNIERMKHLKTLPMVPF 356 

I : II II: : : : I I : : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : : : : I I : : : I I : I 
Db 370 TT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVLP 473 

I I : I I : I : I : : : I I I I I : : I I I : I I : I I 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : I I : : I : I I I I : I I : : : : : I I : : I : 

Db 481 MRMLPSI IFTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAI AAGQS WSVA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : | I I I : : : I I : : I I I I I I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 15 
US-09-866-866A-10 

; Sequence 10, Application US/09866866A 
; Patent No. US20020102244A1 



GENERAL INFORMATION: 
APPLICANT: Sorrentino, Brian 
APPLICANT: Schuetz, John 

TITLE OF INVENTION: A Method of Identifying and/or Isolating Stem Cells 
FILE REFERENCE: 134 0-1-02 1CIP2 
CURRENT APPLICATION NUMBER: US/09/866, 866A 
CURRENT FILING DATE: 2001-08-30 
PRIOR APPLICATION NUMBER: 09/584,586 
PRIOR FILING DATE: 2000-05-31 
PRIOR APPLICATION NUMBER: PCT/US99/ 11825 
PRIOR FILING DATE: 1999-05-27 
PRIOR APPLICATION NUMBER: 60/086,988 
PRIOR FILING DATE: 1998-05-28 
NUMBER OF SEQ ID NOS : 27 
SOFTWARE: Patentln version 3.0 
SEQ ID NO 10 
LENGTH: 655 
TYPE: PRT 

ORGANISM: Homo sapien 
US-09-866-866A-10 

Query Match 20.3%; Score 674.5; DB 9; Length 655; 

Best Local Similarity 29.0%; Pred. No. 2.3e-55; 

Matches 181; Conservative 137; Mismatches 251; Indels 55; Gaps 18; 

Qy 21 SQSSLEGAPATAP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I : : I :::::: I I : I I : : : I I : : : : : 

Db 13 SQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I :: I I I :: I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVRETLHYTALLAI RRGNPG- S FQKKVEAVMAELSLSHVADRLI GNYSLGGI STG 196 

: : : I I I I I I : : I I : : : : I : I I I III : I : I : I I 

Db 130 VKGTLTVRENLQFSAALRLATTMTNHEKNERINRVIEELGLDIWADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKV^LFDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSEL 256 

11:111 : I : II : : I I I I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILSLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMIESAYKKSAICHKT LKNIERMKHLKTLPMVPF 356 

I : II II: : : : I I : : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : : : : I I : : : I I : I 
Db 370 TT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVLP 473 

I I : I I : I : I : :: | I | | | : : I I I : | | : | | 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 



Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : I I : : I : I I I I : I I : : : : : I I : : I : 

Db 481 MRMLPS 1 1 FTCIVYFMLGLKPKADAFFVMMFTLM MVAYS AS SMALAIAAGQS WS VA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : I I : : I I I I I I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



Search completed: February 27, 2004, 07:34:06 
Job time : 30.2557 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 



Run on: 



February 27, 2004, 06:40:43 ; Search time 36.1394 Seconds 

(without alignments) 
5683.620 Million cell updates/sec 



Title: 

Perfect score: 
Sequence : 



US-09-989-981A-6 
3326 

1 MGDLSSLTPGGSMGLQVNRG. 



, PALVILGIWFKIRDHL1SR 651 



Scoring table: BLOSUM62 

Gapop 10.0 , Gapext 0.5 

Searched: 1017041 seqs, 315518202 residues 

Total number of hits satisfying chosen parameters: 1017041 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 



Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



Database : SPTREMBL_25 : * 

1: sp_archea:* 

2: sp_bacteria : * 

3 : sp_f ungi : * 

4 : sp_human : * 

5: sp_invertebrate: * 

6 : sp_mammal : * 

7 : sp_mhc : * 

8 : sp_organelle : * 

9 : sp__phage : * 

10: sp_plant:* 

1 1 : sp_rodent : * 

12: sp_virus:* 

13: sp_vertebrate : * 

14: sp_unclassif ied: * 

15: sp_rvirus:* 

16: sp_bacteriap : * 

17 : sp_archeap : * 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 



SUMMARIES 

% 

Result Query 

No. Score Match Length DB ID Description 



1 


2726. 5 


82 . 


0 


652 


11 


Q7TSR8 


Q7tsr8 mus musculu 


2 


704 


21 . 


2 


673 


11 


Q8R543 


Q8r543 mus musculu 


3 


701 


21 . 


1 


672 


11 


Q7TSR7 


Q7tsr7 mus musculu 


4 


697 


21. 


0 


672 


11 


Q7TSR6 


Q7tsr6 mus musculu 


5 


691 


20. 


8 


672 


11 


Q8CIQ5 


Q8ciq5 rattus norv 


6 


680.5 


20. 


5 


655 


4 


Q96TA8 


Q96ta8 homo sapien 


7 


679.5 


20. 


4 


655 


4 


Q8IX16 


Q8ixl6 homo sapien 


8 


672.5 


20. 


2 


655 


4 


Q96LD6 


Q961d6 homo sapien 


9 


668.5 


20. 


1 


656 


6 


Q8MIB3 


Q8mib3 sus scrofa 


10 


663 


19. 


9 


657 


11 


Q7TMS5 


Q7tms5 mus musculu 


11 


662 


19. 


9 


801 


5 


Q8T691 


Q8t691 dictyosteli 


12 


660 


19. 


8 


657 


11 


Q9R004 


Q9r004 mus musculu 


13 


649.5 


19. 


5 


725 


10 


Q9M3D6 


Q9m3d6 arabidopsis 


14 


639.5 


19. 


2 


643 


5 


Q7YYX5 


Q7yyx5 cryptospori 


15 


628 


18 . 


9 


691 


10 


Q8RWI9 


Q8rwi9 arabidopsis 


16 


626.5 


18. 


8 


657 


11 


Q80XF3 


Q8 0xf 3 rattus norv 


17 


623 


18. 


7 


679 


5 


Q8IS30 


Q8is30 bactrocera 


18 


622.5 


18. 


7 


657 


11 


Q8 0W57 


Q80w57 rattus norv 


19 


622 


18. 


7 


668 


10 


Q9ARU4 


Q9aru4 oryza sativ 


20 


620.5 


18. 


7 


657 


11 


Q80ST1 


Q80stl rattus norv 


21 


62 0 


18. 


6 


692 


10 


Q7XUM2 


Q7xum2 oryza sativ 


22 


618.5 


18. 


6 


672 


10 


Q9LI82 


Q91182 arabidopsis 


23 


617 


18 . 


6 


727 


10 


Q9FNB5 


Q9fnb5 arabidopsis 


24 


615.5 


18. 


5 


723 


10 


Q8LNT5 


Q81nt5 oryza sativ 


25 


615 


18. 


5 


692 


5 


P91892 


P91892 aedes aegyp 


26 


614.5 


18. 


5 


703 


10 


Q8RXN0 


Q8rxn0 arabidopsis 


27 


614 


18. 


5 


594 


10 


Q9LJC3 


Q91jc3 arabidopsis 


28 


614 


18. 


5 


720 


10 


Q9M2V7 


Q9m2v7 arabidopsis 


29 


610.5 


18. 


4 


725 


10 


Q9ZU35 


Q9zu35 arabidopsis 


30 


610.5 


18. 


4 


725 


10 


Q9ASR9 


Q9asr9 arabidopsis 


31 


610 


18. 


3 


679 


5 


Q9BH97 


Q9bh97 ceratitis c 


32 


608 


18. 


3 


708 


10 


Q9M2V5 


Q9m2v5 arabidopsis 


33 


602.5 


18. 


1 


654 


10 


Q9LIW2 


Q91iw2 oryza sativ 


34 


600.5 


18. 


1 


670 


5 


077423 


077423 bactrocera 


35 


600 


18. 


0 


604 


5 


Q8MRJ2 


Q8mrj2 drosophila 


36 


600 


18. 


0 


787 


10 


Q8H8V7 


Q8h8v7 oryza sativ 


37 


597 


17. 


9 


590 


10 


Q9MAH4 


Q9mah4 arabidopsis 


38 


595.5 


17. 


9 


658 


5 


016574 


016574 caenorhabdi 


39 


595.5 


17. 


9 


687 


5 


Q94960 


Q94960 drosophila 


40 


595.5 


17. 


9 


785 


4 


Q96L76 


Q9617 6 homo sapien 


41 


592 


17. 


8 


610 


5 


P90746 


P90746 caenorhabdi 


42 


591.5 


17. 


8 


740 


10 


080946 


080946 arabidopsis 


43 


589.5 


17. 


7 


646 


10 


Q9C6R7 


Q9c6r7 arabidopsis 


44 


588.5 


17. 


7 


646 


11 


Q8K4E1 


Q8k4el mus musculu 


45 


588.5 


17. 


7 


648 


10 


Q9C6W5 


Q9c6w5 arabidopsis 



ALIGNMENTS 



RESULT 1 
Q7TSR8 

ID Q7TSR8 PRELIMINARY; PRT; 652 AA. 

AC Q7TSR8; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ATP-binding cassette sub-family G member 5. 

GN ABCG5 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=l/LnJ; TISSUE=Liver ; 

RA Wittenburg H . , Lyons M.A. , Li R. f Churchill G.A., Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AY195872; AAO45093.1; 

KW ATP-binding. 

SQ SEQUENCE 652 AA; 73236 MW; 0125FB617DE2 96B9 CRC64; 

Query Match 82.0%; Score 2726.5; DB 11; Length 652; 

Best Local Similarity 79.4%; Pred. No. 5.5e-193; 

Matches 518; Conservative 68; Mismatches 65; Indels 1; Gaps 1 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

11:1 I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

Db 1 MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I II II I : I : I I I 
Db 61 CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRCTGTLEGDVFVNGCE 120 

Qy 12 0 LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

I I I : I I I I I I I I I I I I I I I I I I I I I I I llhll: I : : I I I I I I I I I I I I I 
Db 121 LRRDQFQDCFSYVXQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 180 

Qy 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT^ 239 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 
Db 181 VADQVIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 240 

Qy 240 RRNRI VVLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I : I I I :: I I I I I I I I I I I I I I I I I I : : I I I : I I I I I I II I I I : I I I I I I I I I I I I I 
Db 241 RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

Qy 300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I I I I I I I I I I I : I I I : I : I I II I = t I I I ::IMII MINI 
Db 301 FYMDLTSVDTQSREREIETYKRVQMLESAFKESDIYHKILENIERARYLKTLPTVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I I I : I I I I I I I I I I I I I I : II I III I I : I I I I I I I I I : I : : I I I : : I I I I I : II I 
Db 361 DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

Qy 420 VGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWAT 47 9 

I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I 1111:111 II I I I I :: I I 
Db 421 VGLLYQFVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHALPFSIIAT 480 



QY 



480 



MI FS SVCYWTLGLHPEV7VRFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI WSVVALLS I 539 
: I I I I I I I I I I I I : I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I 



Db 481 VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI WS I VALLS I 540 



Qy 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILVWEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I I I I I I I I I : I I I I I II I I I I I I I I I I I I I I I I I II : : : I I 
Db 541 SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLN-FTCGESNTTMLNHPM 600 

Qy 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I : : I I II I I I II I I I I I I I I I I I I I I I I I I I I I I : I I : I I : I I I I 
Db 601 CAITQGVEFIEKTCPGATSRFTANFLILYGFIPALVILGIVI FKVRDYLISR 652 



RESULT 2 
Q8R543 

ID Q8R543 PRELIMINARY; PRT; 673 AA. 

AC Q8R543; 

DT 01-JUN-2002 (TrEMBLrel. 21, Created) 

DT 01-JUN-2002 (TrEMBLrel. 21, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Sterolin 2. 

GN ABCG8 . 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=129/Sv; 

RA Lu K., Zhou Y. , Lee M.-H., Patel S.B.; 

RT "Molecular cloning, genomic structure and characterization of novel 



RT 


mouse 


head-to-head tandem ABC transporters."; 


RL 


Submitted (FEB- 


2001) to 


the 


EMBL/ GenBank/DDBJ databases. 


DR 


EMBL; 


AF351811; 


AAL82898 


.1; 




DR 


EMBL; 


AF351799; 


AAL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351800; 


AAL82898 


-1; 


JOINED. 


DR 


EMBL; 


AF351801; 


AAL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351802; 


AAL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351803; 


AAL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351804; 


A^lL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351805; 


AAL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351807; 


AAL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351808; 


7\AL82898 


.1; 


JOINED. 


DR 


EMBL; 


AF351809; 


AAL82898 


-1; 


JOINED . 


DR 


EMBL; 


AF351810; 


AAL82898 


.1; 


JOINED. 


DR 


GO; GO: 0016020; 


C : membrane; 


IEA. 



DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC__transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC__TRANSPORTER_2 ; 1. 

SQ SEQUENCE 673 AA; 76008 MW; FA08340445DF259C CRC64; 

Query Match 21.2%; Score 704; DB 11; Length 673; 

Best Local Similarity 28.7%; Pred. No. 1.8e-43; 

Matches 195; Conservative 130; Mismatches 261; Indels 94; Gaps 17; 



Qy 
Db 

QY 
Db 

Qy 
Db 

Qy 

Db 
Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 
Db 

Qy 

Db 

Qy 

Db 



11 GSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-I 57 

I : : : : I I I I : : : I I : : I : I I I I : : : 

14 GTVLQDASQGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQL 69 

58 TSCRQQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF 109 

: I : : : : | I I I I : : I : I I I I I : : | | | : : | | I I 

70 AQFKIPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKM 128 

110 -LGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKK 167 

I : : : : I I : : : I : : I I I I I : II I I I I I : I : : I : I : 

129 KSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKR 188 

168 VEAVMAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT 227 

■ I I I : I I I I I : : I I : I : I I II II I II I I I : I : : : I I I I : I I I I 
189 VEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFT 248 

228 ANQI WLLVELARRNRI WLTIHQPRSELFQLFDKI AI LS FGELI FCGTPAEMLDFFNDC 287 

I : : I I I I : I I : I : : : : I I I I I : : I : II I : : : : I I : I : I : : I 
249 AHNLVTTLSRLAXGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSI 308 

288 GYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKN 341 

I : I I I : I I I I I I : I I I I : I : I II I I : I : : I : : : : : I 

309 GHPCPRYSNPADFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKE 368 

342 IERMKHLKTLPMVPFKTKDS P GVF S KLGVL L RRVT RN LVRN KLAVI T RL LQN 393 

: I :| : |:|: II: : hll I I : :: : 

369 LNTSTHTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEA 424 

394 LIMGLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQES 453 

: I I : I : | : | | | : | : : | : | : | : : | 

425 CLMS L 1 1 GFL YYGHGAKQL — S FMDTAALL FMI GALI P FNVI LDWS KCHS ERSML YYEL 482 

454 QDGLYQKWQMMIAYALHVLPFSWATMIFSSVCYWT^ 513 

:IMI I I II :[:: II I I II 
483 EDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRP VPELFL 529 

514 EFLTLVLLGIVQNPNIVNSWALLS IAGVLVGS GFLRNIQEMPIPFKI 561 

III: | : : | : | | | I I : I : : I 

530 LHFLLWLWFCCRNMAIJVVSAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAW 589 

562 I S YFT FQK YC S E I LWNE FYGLN FT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSR 619 

I I : I : : I I : : I I : I I : I : : I | 
590 ISKLSFLRWCFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISA 632 

620 FTMN FLI LYS FI PALVI LGI 639 

: I II: I : : : I I 

633 MDLNSHPLYAIY — LIVTGI 650 



RESULT 3 
Q7TSR7 

ID Q7TSR7 PRELIMINARY; PRT; 672 AA. 

AC Q7TSR7 ; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ATP-binding cassette sub-family G member 8. 

GN ABCG8. 

OS Mus musculus (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=I/LnJ; TISSUE=Liver; 

RA Wittenburg H., Lyons M.A., Li R. , Churchill G.A., Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2 002) to the EMBL/GenBank/DDBJ databases. 

DR EMBL; AY196215; AAO45095.1; -. 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75805 MW; E5B30B58902 00A4 1 CRC64; 

Query Match 21.1%; Score 701; DB 11; Length 672; 

Best Local Similarity 29.2%; Pred. No. 3e-43; 

Matches 196; Conservative 129; Mismatches 262; Indels 84; Gaps 18 

Qy 15 LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 61 

II I I I I : : : I I : : I : I I I I : : : : 

Db 17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 

Qy 62 QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 112 

I : : : : | I I I I : : I : I I I I I : : I I I : : I I I | | : 

Db 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 131 

Qy 113 VYWGRALRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI -RRGNPGS FQKKVEAV 171 

= : s I I : : : I : : I I I I I : I I I I I I I : I : : I : hill 

Db 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 

Qy 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

: I I I I I : : I I : I : I I I II I I I I I I I : I : : : I I I I : I II I I : : 
Db 192 IAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

Qy 232 WLLVELARRNRI WLTI HQPRSELFQLFDKIAI LS FGELI FCGTPAEMLDFFNDCGYPC 291 

I I I I : I I : I : : : : | | | | I : : I : I I I : : : : I I : I : I : : I I : I I 
Db 252 VTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSIGHPC 311 

QY 292 PEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERM 345 

I : I I I I I I : I I I I : I : I I I I I : I : : I : : : : : | : 

Db 312 P RYSN PAD FYVDLT S I DRRS KEREVAT VEKAQ S LAAL FLEKVQGFDD FLWKAEAKELNT S 371 

Qy 346 KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 397 

I : I : I : I : I I : : I : I I I I : : : : : | 

Db 372 THTVSLTL TQDTDCGTAAELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

Qy 398 LFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGL 457 

I : I : | : | ||: |: :|: |: |:: | :||| 

Db 428 LI IGFLYYGHGAKQL — S FMDTAALLFMI GALI PFNVI LDWSKCHSERSMLYYELEDGL 485 

Qy 458 YQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 516 

I I I I I : I : : II II I II I I : : I 



Db 



486 YTAGP YFFAKI LGELPEHCAYVT I YAMPI YWLTNLRPVPELF LL — HLLLVWLV 537 



Qy 517 TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 570 

i : I I : : I : : II : I : : I I I : I : : 

Db 538 VFCCRTMALAASAMLPT FHMS S FFCNAL YNS FYLTAGFMINLDNLWI VPAWI S KLS FLRW 597 

Qy 571 CSEILWNEFYGLNFT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 62 8 

I I : : I I : I I : I : : I I : I I I 

Db 598 CFSGLMQIQFNGHLYTTQI GNFTFS ILGDTM ISAMDLNSHPLY 640 

Qy 629 SFIPALVILGI 639 

: |:::|| 
Db 641 AIY — LIVIGI 649 

RESULT 4 
Q7TSR6 



ID Q7TSR6 PRELIMINARY; PRT; 672 AA. 

AC Q7TSR6; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette sub- family G member 8. 

GN ABCG8 . 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=PERA/Ei; TISSUE=Liver ; 

RA Wittenburg H., Lyons M.A. , Li R. , Churchill G.A., Carey M.C., 

RA Paigen B. ; 

RT "Primary Roles of FXR and ABCG5/ABCG8 in Cholesterol Gallstone 

RT Susceptibility: Evidence from a Cross of PERA/Ei and I/Ln Inbred 

RT Mice."; 

RL Submitted (DEC-2002) to the EMBL/GenBank/ DDB J databases. 

DR EMBL; AY196216; AAO45096.1; -. 

KW ATP-binding. 

SQ SEQUENCE 672 AA; 75867 MW; CAB720502EA8FE2 1 CRC64 ; 



Query Match 21.0%; Score 697; DB 11; Length 672; 

Best Local Similarity 29.1%; Pred. No. 5.9e-43; 

Matches 195; Conservative 12 9; Mismatches 263; Indels 84; Gaps 18; 

Qy 15 LQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-ITSCR 61 

II I I I I : : : I I : : I : I I I I : : : 

Db 17 LQDASGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQLAQFK 72 

Qy 62 QQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF-LGE 112 

I : : : : | I I I I : : I : I I I I I : : I I I : : I I | I | : 

Db 73 IPWRSHSSQDSCELGIRNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKMKSGQ 131 

Qy 113 VYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKKVEAV 171 

: : : I I : : : I : : I I I I I : I I I I I I I : I : : I : I : I I I 

Db 132 IWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDV 191 



Qy 172 MAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI 231 

: I I I I I : : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : 
Db 192 IAELRLRQCANTRVGNTWRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNL 251 

Qy 232 WLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPC 291 

I I M : | | : | : : : : | | | | | : : | : | | | : : : : | | : | : I : : I I : I I 
Db 252 VTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLI^SGTPIYLGAAQQMVQYFTSIGHPC 311 

Qy 292 P EH SN P FD F YMDLT S VDTQ S KERE I ET S KRVQMI E S AYKKS A ICHKTLKNIERM 345 

I : I I I I I I : I I I I : I : I I I I I : I : : I : : : : : I : 

Db 312 PRYSNP7VDFYVDLTSIDRRSKEREVATVEKAQSLAALFLEKVQGFDDFLWKAEAKELNTS 371 

Qy 346 KHLKTLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMG 397 

I : I : I : I : I I : : I : I I I I : :: : : I 

Db 372 THTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEACLMS 427 

Qy 398 LFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGL 457 

I : I : | : | | | : | : : | : | : | : : | : | | | 

Db 428 LI IGFLYYGHGAKQL — S FMDTAALLFMI GALI PFNVI LDWSKCHSERSMLYYELEDGL 485 

Qy 458 YQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL- 516 

I I I I I : I :: II II I II I : : I 

Db 486 YTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLLVWLV 537 

Qy 517 TLVLLGIVQNPNI -VNS WALLS IAGVLVGSGFLRNIQEMPI PFKI I S YFTFQKY 570 

hi I : : I : : | | : | : : | | | : | : : 

Db 538 VFCCRTMALAASAMLPTFHMSSFFCNALYNSFYLTAGFMINLDNLWIVPAWISKLSFLRW 597 

Qy 571 CSEILWNEFYGLNFT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNFLILY 628 

I I : : I I : I I : I : : I I : I II 

Db 598 CFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISAMDLNSHPLY 640 

Qy 629 SFIPALVILGI 639 

: I : : : I I 

Db 641 AIY — LIVIGI 649 

RESULT 5 
Q8CIQ5 



ID Q8CIQ5 PRELIMINARY; PRT; 672 AA. 

AC Q8CIQ5; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT 01-MAR-2003 (TrEMBLrel. 23, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Sterolin 2. 

GN ABCG8 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; 

RA Yu H., Lu K., Lee M., Pandit B., Patel s.B.; 

RT "The rat Abcg5 and Abcg8 : characterization, chromosomal assignment and 

RT genetic variation in sitosterolemic rats."; 

RL Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 



DR EMBL; AY145899; AAN64276.1; -. 

DR GO; GO: 0016020; Cimembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO:0006810; P:transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

DR PROSITE; PS508 93; ABC_TRANSPORTER_2 ; 1. 

SQ SEQUENCE 672 AA; 75906 MW; 2FE084 6E7 1BD9D47 CRC64; 

Query Match 20.8%; Score 691; DB 11; Length 672; 

Best Local Similarity 28.3%; Pred. No. 1.6e-42; ? 

Matches 189; Conservative 126; Mismatches 264; Indels 88; Gaps 15 

Qy 23 SSLEGAPATAPEPHSLGILHASYSVSHRVR PW WDITSC 60 

III: : :: :|| :: I : II II | 

Db 21 SSLQDSVFSSESDNSLYFTYSGQSNTLEVRDLTYQVDMASQVPWFEQLAQFKLPWRSRGS 80 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

: I I : : : I I II I : : I : I I : I I : I I I I : : I I I : : : : I I : 

Db 81 QDSWDLGI-RNLSFKVRSGQMLAIIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPS 139 

Qy 121 RREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS F QKKVEAVMAEL 175 

: I I :: I I I I I : I I I I I I I : I : : I : I I : I I I : I I I 

Db 140 T PQLI QKCVAHVRQQDQLLPNLTVRETLT FI AQMRL PKTFSQAQRDKRVEDVIAEL 195 

Qy 176 SLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLL 2 35 

I I : : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I I 
Db 196 RLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTL 255 

Qy 236 VELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHS 295 

I I : I I : I : : : : I I I I I : : I : I I I : : : : I I : I I : : I I I I I I : I 

Db 256 SRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYS 315 

Qy 2 96 NPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKNIERMKHLK 34 9 

I I I I I : I I I I : I : I I I : I : I ::::::: : : I : : : 
Db 316 NPADFWDLTSIDRRSKEQEVATMEKARLLAALFLEKVQGFDDFLWKAEAKSLD TG 371 

Qy 350 TLPMVPFKTKDS PGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLL 4 01 

I : I : I : I I : : I : I I I I : : : : I I : 

Db 372 TYAVSQTLTQDTNCGTAAELPGMIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIG 431 

Qy 4 02 FFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKW 4 61 

I I : I II: I : : I : I : I : : I : I I I I 

Db 432 FLYYGHADKPL — S FMDMAALLFMI GALI P FNVI LDWS KCHS ERSLLYYELEDGL YTAG 4 89 

Qy 462 QMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL 516 

I I I I : I : I I I I | | : : | 

Db 4 90 PYFFAKVLGELPEHCAYVIIYGMPIYWLTNLRP GPELFLLHFMLLWLWFCC 541 

Qy 517 -TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEI 574 

I : I I : : I : : I I : I : : I I I : | : : | 

Db 542 RTMALAASAMLPTFHMSSFCCNALYNSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSG 601 



Qy 



575 LWNEFYGLNFTCGS SNVSVTTNPMCAFTQGIQFI EKTCPG — ATSRFTMNFLILYSFIP 632 



I • - I I • I I * * • II . - I II- 

Db 602 LMQIQFNGHIYTTQIGNLTFSV P GDAMVTAMD LN S H P LYAI Y- 643 

Qy 633 ALVILGI 639 

|:::|| 

Db 644 -LIVIGI 649 



RESULT 6 
Q96TA8 

ID Q96TA8 PRELIMINARY; PRT; 655 AA. 

AC Q96TA8; 

DT 01-DEC-2001 (TrEMBLrel. 19, Created) 

DT 01-DEC-2001 (TrEMBLrel. 19, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette superfamily G (White) member 2 (Hypothetical 

DE protein) . 

GN ABCG2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21201983; PubMed=11306452 ; 

RA Komatani H., Kotani H., Hara Y. , Nakagawa R. , Matsumoto M. , 

RA Arakawa H., Nishimura S.; 

RT "Identification of breast cancer resistant protein/mitoxantrone 

RT resistance/placenta-specific, ATP-binding cassette transporter as a 

RT transporter of NB-506 and J-107088, topoisomerase I inhibitors with an 

RT indolocarbazole structure."; 

RL Cancer Res. 61:2827-2832(2001). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Pancreatic carcinoma; 

RA Strausberg R. ; 

RL Submitted (JAN-2002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AB051855; BAB46933.1; -. 

DR EMBL; BC021281; AAH21281.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO:0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW Hypothetical protein; ATP-binding. 

SQ SEQUENCE 655 AA; 72314 MW; A8AF66B96034C5A8 CRC64; 

Query Match 20.5%; Score 680.5; DB 4; Length 655; 

Best Local Similarity 29.2%; Pred. No. 9.4e-42; 

Matches 182; Conservative 137; Mismatches 250; Indels 55; Gaps 18; 



Qy 



21 S Q S S L E GAP AT AP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 



11:11111 : : | :::::: | I : ||: ::|| ::: :: 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQ IMC I LGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I = I I I : I I I : : I I I : : I : I I : I : II I I | : ! I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

QY 138 LLS SLTVRETLHYTALLAI RRGNPG- S FQKKVEAVMAELSLSHVADRLI GNYSLGGI STG 196 

: : : I I I I I I : : I I : : : : hill III : I : I : I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELARRNRIVVLTIHQPRSEL 256 

11:111 : I : I I : : I I I I I I I I I I I : : : I I : : : : | : : : | | | | | : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I- I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMI ESAYKKSAI CHKT LKNI ERMKHLKTLPMVP F 356 

I : MM: :: : | | : : | | | : | : s s 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I MM I II::: : : : I I : : : I I : I 
Db 37 0 TT S FCHQLRWVS KRS FKNLLGNPQAS I AQI I VTWLGLVT GAI YFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVLP 473 

MM Ml: M :: M I I I : : I I I : I I : || 

Db 422 GI QNRAGVLF- FLTTNQCFS SVSAVELFWEKKLFI HEYI S GYYRVS S YFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFST^LAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : I I : : I : I I I I : I I : : : : : | | : : | : 

Db 481 MRMLPSI I FTCIVYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAI AAGQSWSVA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

MM I : : M I M : : I I : : I I I M I I I I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

: : I I I : M 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 7 
Q8IX16 

ID Q8IX16 PRELIMINARY; PRT; 655 AA. 

AC Q8IX16; 

DT 01-MAR-2003 (TrEMBLrel. 23, Created) 

DT 01-MAR-2003 (TrEMBLrel. 23, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette protein ABCG2 . 

GN ABCG2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI TaxID-9606; 



RN [1] 

RP SEQUENCE FROM N.A. 

RA Yoshikawa M., Yabuuchi H., Ikegami Y., Ishikawa T . ; 

RL Submitted (DEC-2 001) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AF463519; AA014 617.1; 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_AT Pa s e . 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 655 AA; 72314 MW; A8AF60B591D4C5A8 CRC64; 



Query Match 20.4%; Score 679.5; DB 4; Length 655; 

Best Local Similarity 29.2%; Pred. No. l.le-41; 

Matches 182; Conservative 137; Mismatches 250; Indels 55; Gaps 18; 

Qy 21 S Q S S L E GAP AT AP E P H S L GI LHAS Y S VS H RVRP WWD I T S C RQQWT RQ I L KD VS L YVE 77 

II: I I I I I : : | :::::: | | : | | : : : I | : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : III :| l|::||| :: I :| I : | : || I I |: I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVRETLHYTALLAI RRGNPG-SFQKKVT1AVMAELSLSHVADRLIGNYSLGGISTG 196 

: : : I I I I I I : : I I : : : : I : I I I III : I : I : I I 

Db 130 VMGTLTVRENLKFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 189 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLVELARRNRIVVXTIHQPRSEL 256 

I I : I I I : I : I I : : I I I I I I I I | | | : : : | | : : : : I : : : M I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

QY 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMI ESAYKKSAI CHKT LKNIERMKHLKTLPMVPF 356 

I : II II: :: : | |: :| | |: | : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : : : : I I : : : I I : | 
Db 370 TT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVXP 473 

I I : I I : I : I : : : I I I I I : : I I I : I I : | | 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 



Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : II : : I : I I I I : I I : : : : : | | : : | : 

Db 481 MRMLPSIIFTCIWFMLGLKPKADAFFVMMFTLM MVAY S AS SMALAI AAGQ S WS VA 537 

Qy 534 VALLSIAGV— LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : I I : : | | I I I I I I | 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFIEKTCPG 615 

I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 8 
Q96LD6 

ID Q96LD6 PRELIMINARY; PRT; 655 AA. 

AC Q96LD6; 

DT 01-DEC-2001 (TrEMBLrel. 19, Created) 

DT 01-DEC-2001 (TrEMBLrel. 19, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter ABCG2 . 

GN ABCG2 . 

OS , Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Schuetz J.D., Wall A.M., Sampath J., Sorrentino B., Du G. ; 

RT "The Human ABC Transporter, ABCG2 , Transports Hoechst 33342 and 

RT Requires an Intact Walker A Motif."; 

RL Submitted (JAN-2001) to the EMBL/GenBank/DDBJ databases. 

DR EMBL; AY017168; AAG52982.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; PpantneJS . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS508 93; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOS PHOPANTETHEINE ; 1. 

KW ATP-binding. 

SQ SEQUENCE 655 AA; 72288 MW; B3B5DC02C095C4A8 CRC64; 

Query Match 20.2%; Score 672.5; DB 4; Length 655; 

Best Local Similarity 29.0%; Pred. No. 3.7e-41; 

Matches 181; Conservative 137; Mismatches 251; Indels 55; Gaps 18; 

Qy 21 S Q S S L E GAP AT AP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 

II: I I I I I : : I :::::: | | : | | : : : | | : : : : : 

Db 13 SQGNTNGFPATASNDLKAFTEGAVLS FHNI CYRVKLKSGFLPCRKPVEKEI LSNINGIMK 72 



Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I : : I I I : : I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLAARKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVRETLHYTALLAIRRGNPG-SFQKKVEAVMA£LSLSHVADRLIGNYSLGGISTG 196 

: : : I I I I I I : : I I : : :.: I : I I I III : I : I : I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIQELGLDKVADSKVGTQFIRGVSGG 18 9 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSEL 256 

I I : I I I : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : : I I II I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ --SK 312 

hill : :|: I 1:1 I I I :| III : : I I I I : : I : : I : :: 
Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE IETSKR VQMIESAYKKSAICHKT LKNIERMKHLKTLPMVPF 356 

I : I I I h : : : I h : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL— LFFVLRVRSNVLKG 414 

I : I : : I : I h I I I : : : : : : I I : : : I h I 
Db 370 TT S FCHQLRWVSKRS FKNLLGNPQASIAQI I VTWLGLVI GAI YFGLKNDST 421 

Qy 415 AI Q D RVG L L YQ FVGAT P YT GMLNAVN L FP VL RAVS DQ E S Q DGL YQ KWQMMLAYAL - HVL P 473 

I I : I I : I : I : : : | | | | | : : | | | : | | : | | 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: : : : I I : : I: I I I : I I : : : : : I I : : h 

Db 481 MRMLPSI I FTCIVYFMLGLKAKADAFFVMMFTLM MVAYSAS SMALAIAAGQSWS VA 537 

Qy 534 VALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

h : I I : : | | | h : : I h : I I I I I I II I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFI EKTCPG 615 . 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 9 
Q8MIB3 

ID Q8MIB3 PRELIMINARY; PRT; 656 AA. 

AC Q8MIB3; 

DT 01-OCT-2002 (TrEMBLrel. 22, Created) 

DT 01-OCT-2002 (TrEMBLrel. 22, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Brain multidrug resistance protein. 

GN BMDP. 

OS Sus scrofa (Pig) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Cetartiodactyla; Suina; Suidae; Sus. 

OX NCBI_TaxID=9823; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=22050127; PubMed=12054514 ; 



RA Eisenblaetter T* , Galla H.J.; 

RT "A new multidrug resistance protein at the blood-brain barrier."; 

RL Biochem. Biophys . Res. Commun. 293:1273-1278(2002). 

DR EMBL; AJ420927; CAD12785.1; -. 

DR PIR; JC7860; JC7860. 

DR GO; GO: 0016020; C : membrane ; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; PpantneJS. 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHO PANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 656 AA; 72392 MW; 118ADD5B53D9D67F CRC64; 

Query Match 20.1%; Score 668.5; DB 6; Length 656; 

Best Local Similarity 28.5%; Pred. No. 7.3e-41; 

Matches 180; Conservative 144; Mismatches 252; Indels 55; Gaps 18; 

Qy 13 MGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDV 72 

::::::: I I : : : : | : : : : : | | : | | : : : I I : : 

Db 8 VS I PMSKRNTNGLPGSS SNELKTSAGGAVLS FHDI CYRVKVKSGFLFCRKTVEKEI LTNI 67 

Qy 73 SLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYV 132 

: : : I : I I I : I I I : : I I I : : I I I : I : I I I I I : II 

Db 68 NGIMKPG-LNAILGPTGGGKSSLLDvTjyVRKDPHG-LSGDVLING-APRPANFKCNSGYV 124 

Qy 133 LQSDTLLSSLTVRETLHYTALLAIRRGNPGSF QKKVEAVMAEL S L S HVAD RL I GN 187 

: I I : : : I I I I I I : : I I : I : :: : I : I I I III : I 

Db 125 VQDDWMGTLTVRENLQFSAALRL PTTMTNHEKNERINMVIQELGLDKVADSKVGT 180 

Qy 18 8 YSLGGISTGERRRVSI7UVQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELARRNRI VVL 247 

: I : I I I I : I I I I : I : I I : : I I I I I I I I I I I : : : I I ::::!:: 
Db 181 QFIRGVSGGERKRTSIAMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIF 240 

Qy 24 8 TIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSV 307 

: I I I I I : I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : 
Db 241 SIHQPRYSIFKLFDSLTLLASGRLMFHGPAREALGYFASIGYNCEPYNNPADFFLDVING 300 

Qy 308 DTQ S KEREI ET S KRVQMI E SAYKKSAI CHKTLKNI E RMK 346 

I: ::| I I :|: : I |: I : : : | 

Db 301 DSSAVVLSRADRDEGAQEPEEPPEKDTPLIDKLAAFYTNSSFFKDTKVELDQFSGGRKKK 360 

Qy 347 HLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFV 404 

I : I : I : I I : I I : I | : : :: : I : I I : : | : 

Db 361 KSSVYKEVTYTT S FCHQLRWI S RRS FKNLLGNPQASVAQI I VTI I LGLVI GAI FYD 416 

Qy 4 05 LRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMM 464 

I: I I I 1:1 1:1: I: ::|| I I : : I I I : 

Db 417 LK NDPSG-IQNRAGVLF-FLTTNQCFSSVSAVELLWEKKLFIHEYISGYYRVSSYF 471 



Qy 465 LAYAL-HVXPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGI 523 

I : I I : : : : | I : : I : I | | | I I | : : : : : I I 

Db 472 FGKLLSDLLPMRMLPSIIFTCITYFLLGLKPAVGSFFIMMFTLM MVAYSAS SMALAI 528 

Qy 524 VQNPNIVNSWALLSIAGV— LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFY 581 

::|: |::|: I :: II I |:: : : ||: :| | ||| 

Db 529 AAGQSWSVATLLMTISFVFMMIFSGLLVNLKTWPWLSWLQYFSIPRYGFSALQYNEFL 588 

Qy 582 GLNFTCGSSNVSVTTNPMCAFT — QGIQFIE 610 

III I : : I I I I 1:1 I : : : I 

Db 589 GQNFCPG LNVTTNNTCSFAI CTGAEYLE 616 



RESULT 10 
Q7TMS5 

ID Q7TMS5 PRELIMINARY; PRT; 657 AA. 

AC Q7TMS5; 

DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 2. 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N. A. 

RC STRAIN=C57BL/6NCr; TISSUE=Hematopoietic Stem Cell; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L., Feingold E.A., Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L., Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H. , Moore T., Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L., Marusina K., Farmer A. A., Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., Usdin T.B., Toshiyuki S., Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A., Peters G.J., Abramson R.D., Mullahy S.J., 

RA Bosak S.A., McEwan P.J., McKernan K.J., Malek J. A. , Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R.A., 

RA Fahey J., Helton E., Ketteman M., Madan A., Rodrigues S., Sanchez A., 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y., Bouffard G.G., 

RA Blakesley R.W., Touchman J.W., Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J., Myers R.M., Butter field Y.S., 

RA Krzywinski M.I., Skalska U., Smailus D.E., Schnerch A., Schein J.E., 

RA Jones S.J., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length human 

RT and mouse cDNA sequences."; 

RL Proc. Natl. Acad. Sci. U.S.A. 99:16899-16903(2002). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC STRAIN=C57BL/6NCr; TISSUE-Hematopoietic Stem Cell; 

RA Strausberg R. ; 

RL Submitted (JUN-2003) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; BC053730; AAH53730.1; -. 

KW ATP-binding. 

SQ SEQUENCE 657 AA; 72977 MW; DCD7 0C5D9FA2BA5F CRC64; 



Query Match 19.9%; Score 663; DB 11; Length 657; 

Best Local Similarity 28.2%; Pred. No. 1.9e-40; 

Matches 182; Conservative 135; Mismatches 241; Indels 88; Gaps 19; 

Qy 13 MGLQWRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDV 72 

I : I I : III I | : | | : | : : : | : : : | | | : 
Db 12 MSQRNNNGLPRTNSRAVRTLAEGDVLSFHHITYRV — KVKSGFLV RKTVEKEILSDI 66 

Qy 73 SLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYV 132 

: I : I I I : I I I : : I I I : : I I |:| :|| I : I : I || 

Db 67 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPKG-LSGDVLING-APQPAHFKCCSGYV 123 

Qy 133 LQ S DT LL S S LTVRET LH YT ALLAI RRGN P GS FQ KKVEAVMAELSLSHVADRLIGN 187 

: I I : : : I I I I I I : : I I : I : : : : : : : I I I I I | : | 

Db 124 VQ D D WMGT LT VREN LQ F S AAL RL PTTMKNHEKNERINTIIKELGLEKVADSKVGT 179 

Qy 188 YSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTTVNQIVVLLVELARRNRIWL 247 

: I I I I I I : I I I : I : I I : : I I I I I I I I I I I : : : I I : : : : | : : 
Db 180 QFIRGISGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIF 239 

Qy 248 TIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSV 307 

: I I I I I : I : I I | : : | : | : | : | | : | : : | III : : | | | | : : | : : 
Db 240 SIHQPRYSIFKLFDSLTLLASGKLVFHGPAQKALEYFASAGYHCEPYNNPADFFLDVING 299 

Qy 308 DTQS KEREIETSKR VQMIESAYKKSAICHKTLKNIERMKHLKTLP 352 

I : : : | : : | : | : : : I III :| : : : : 
Db 300 DSSAVMLNREEQDNEANKTEEPSKGEKPVI ENLSEFYINSAIYGETKAELDQL 352 

Qy 353 MVPFKTKDSPGVFSKLGV LLRRVTRNLVRNKLAVITRLLQNL 394 

II II : | | : | | : | | : : | : : 

Db 353 PGAQEKKGTSAFKEPVYVTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTV 403 

Qy 395 IMGLFL — LFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQE 452 

I : I I : : : I I : : : I : I I : I : I : : : I I I I I : : | 

Db 404 I LGLI I GAI YFDLKYDA AGMQNRAGVLF-FLTTNQCFSSVSAVELFVVEKKLFIHE 458 

Qy 453 SQDGLYQKWQMMLA.YAL-HVTjPFSVVATMIFSSVCYWTLGLHPEVAR 511 

II: : : I I : : : I I : I I : I I I II I : : 
Db 459 YISGYYRVSSYFFGKVMSDLLPMRFLPSVIFTCVLYFMLGLKKTVDAFFIMMFTLI M 515 

Qy 512 IGEFLTLVLLGIVQNPNIWSWALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQK 569 

: : : I I : : I : I : : I I I : : I I I I : : : : | | : : 

Db 516 VAYTASSMALAIATGQSWSVATLLMTIAFVFMMLFSGLLVNLRTIGPWLSWLQYFSIPR 575 

Qy 570 YCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPG 615 

I I I I I I I I : I I I I : I II 
Db 57 6 YGFTALQYNEFLGQEFCPG FNVTDNSTCVNSYAI CTG 612 



RESULT 11 
Q8T691 

ID Q8T691 PRELIMINARY; PRT; 801 AA. 

AC Q8T691; 

DT 01-JUN-2002 (TrEMBLrel. 21, Created) 

DT 01-JUN-2002 (TrEMBLrel. 21, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 



DE ABC transporter AbcGl. 

GN ABCG1 . 

OS Dictyostelium discoideum (Slime mold) . 

OC Eukaryota; Mycetozoa; Dictyosteliida; Dictyostelium. 

OX NCBI__TaxID=44 68 9; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Ax4; 

RA Anjard C, Loomis W.F.; 

RT "Evolution of the ABC transporters of Dictyostelium."; 

RL Submitted (FEB-2002) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AF482380; AAL91485.1; -. 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F : ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC__transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC__TRANSPORTER_l ; 1. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

KW ATP-binding; Transport. 

SQ SEQUENCE 801 AA; 90052 MW; CCC4F0036CB195A3 CRC64; 



Query Match 19.9%; Score 662; DB 5; Length 8 01; 

Best Local Similarity 27.4%; Pred. No. 2.8e-40; 

Matches 185; Conservative 134; Mismatches 246; Indels 110; Gaps 16 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

: : : : : I I I : : : : : I I I I I : I I I : I I I I I I I : : I I I : I : I : I I 
Db 131 KKKISKQILTNINGHIESGTIFAIMGPSGAGKTTLLDILAHRLNINGS — GTMYLNGNKS 188 

Qy 121 RREQFQDCFS YVXQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQ-KKVEAVMAELSLSH 17 9 

I : I I I I I : I : I I I I I I I I :: I I : I I : : : I : : : I : I : 

Db 189 DFNIFKKLCGYVTQSDSLMPSLTVRETLNFYAQLKMPRDVPLKEKLQRVQDIIDEMGLNR 248 

Qy 180 VADRLIG — NYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVE 237 

I I I : I : : I I I I I I I I I : I :: I I I I : I I I I I : I I I I : : : I : 
Db 249 CADTLVGTADNKI RGI SGGERRRVTI S I ELLTGPSVI LLDEPTSGLDASTS FYVMSALKK 308 

Qy 238 LARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNP 2 97 

II: I : : I I I I I I I : : : I I : : I I I : I : I : : I I I I I I : I I 
Db 309 L7VKSGRTIICTIHQPRSNIYDMFDNLLLLGDGNTIYYGKANKALEYFNANGYHCSEKTNP 368 

Qy 298 FDFYMDL — TSVDTQS 311 

I I : : I I I I : I : 

Db 369 ADFFLDLINTQVEDQADSDDDDYNDEEEEI GGGGGGSGGGAGGI EDI GI SI S PTMNGSAV 428 

Qy 312 KEREIE TSKRVQMIESAYKKS AICHKTLKN 341 

II:: I : : : : : I | : : I I | 

Db 429 DNIKNNELKQQQQQQQQQQQSTDGRARRRIKKLTKEEMVILKKEYPNSEQGLRVNETLDN 488 



QY 



342 IER MKHLKTLPMVPFKTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIM 396 



I : I : I I I : : : I I | I I : : I | : | : 

Db 489 I SKENRTDFKYEKT RGPNFLTQFSLLLGREVTNAKRHPMAFKVNLIQAIFQ 539 

Qy 397 GLFLLFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDG 456 

I II : : : : : : : | | | : : : : :::::: | | : : : : | 
Db 540 G— LLCGIVYYQLGLGQSSVQSRTGWAFIIMGVSFPAVMSTIHVFPDVITIFLKDRASG 597 

Qy 457 LYQKWQMMLAYALHVLP FS WATMI FS SVCYWTLG — LHPEVARFGYFSAALLAPHLIGE 514 

: I I I : : | : | : : : : | | : I : : I I : | : 
Db 598 VYDTL P FFLAKS FMDAC I AVLLPMWAT I VYWMTNQRVD P F YS AAP FFRFVLM LVLA 654 

Qy 515 FLTLVLLGIVQN PNI-WSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKY 57 0 

I : I I : : : I I : I : I I I : : I I I I : : : I I : I : I 

Db 655 SQTCLSLGVLISSSVPNVQVGTAVAPLIVILFFLFSGFFINLNDVPGWLVWFPYISFFRY 714 

Qy 571 CSEILWNEFYGLNFTCGS SNVSVTTNPMCAFTQGIQFI EKTCPGATSRFTMNFLI LYS F 630 

I I : I I : : I I I I : I I II I I I I : 

Db 715 MI EAAVI N AFKDVH FT CT D S Q KI GGVC P VQ YGNNVI E-NMG YD I DH FWRNVW I LVL Y 770 

Qy 631 I PALVT LGI WFKI R 645 

I : I : I I : : 
Db 771 IIGFRVLTFLVLKLK 785 



RESULT 12 
Q9R004 

ID Q9R004 PRELIMINARY; PRT; 657 AA. 

AC Q9R004; 

DT 01-MAY-2000 (TrEMBLrel. 13, Created) 

DT 01-MAY-2000 (TrEMBLrel. 13, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Breast cancer resistance protein 1. 

GN ABCG2 OR BCRPl. 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=FVB; TISSUE=Liver ; 

RX MEDLINE=99413474; PubMed=10485464 ; 

RA Allen J.D., Brinkhuis R.F., Wijnholds J., Schinkel A.H.; 

RT "The mouse Bcrpl/Mxr/Abcp gene: amplification and overexpression in 

RT cell lines selected for resistance to topotecan, mitoxantrone, or 

RT doxorubicin."; 

RL Cancer Res. 59:4237-4241(1999). 

DR EMBL; AF140218; AAD54216.1; -. 

DR MGD; MGI: 1347061; Abcg2 . 

DR GO; GO: 0016020; C:membrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti . . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR006162; Ppantne_S . 

DR Pfam; PF00005; ABC tran; 1. 



DR ProDom; PD000006; ABC__transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

DR PROSITE; PS00012; PHOSPHOPANTETHEINE; 1. 

KW ATP-binding. 

SQ SEQUENCE 657 AA; 73021 MW; 207B70BC272CC0D5 CRC64; 



Query Match 19.8%; Score 660; DB 11; Length 657; 

Best Local Similarity 28.0%; Pred. No. 3.1e-40; 

Matches 181; Conservative 135; Mismatches 242; Indels 88; Gaps 19; 

Qy 13 MGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDV 72 

I : I I III I | : | | : I : : : I : : : I I I : 
Db 12 MSQRNNNGLPRMNSRAVRTLAEGDVLSFHHITYRV — KVKSGFLV RKTVEKEILSDI 66 

Qy 73 SLWESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYV 132 

: :: I : I I I : I I I : : I I I : : I I |:| :|| I : |: I II 

Db 67 NGIMKPG-LNAILGPTGGGKSSLLDVLAARKDPKG-LSGDVLING-APQPAHFKCCSGYV 123 

Qy 133 LQ S DT LLS S LT VRET LH YT ALLAI RRGN PGS FQ KKVEAVMAELSLSHVADRLIGN 187 

: I I : : : I I I I I I : : I I : I : : : : : : : I I I III : I 

Db 124 VQDDWMGTLTVRENLQFSAALRL PTTMKNHEKNERINTIIKELGLEKVADSKVGT 179 

Qy 188 YSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVTiLVELARRNRIVVL 24 7 

: I I I I I I : I I I : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : 
Db 180 QFIRGISGGERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIF 239 

Qy 248 TIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSV 307 

: I I I I I : I : | I I : : I : I : I : I I : I : : I III : : I I I I : : I : : 
Db 240 SIHQPRYSIFKLFDSLTLLASGKLVFHGPAQKALEYFASAGYHCEPYNNPADFFLDVING 299 

Qy 308 DTQS KEREIETSKR VQMIESAYKKSAI CHKTLKNI ERMKHLKTLP 352 

I : : : | : : | : | : : : I I I I : I : : : : 
Db 300 DSSAVMLNREEQDNEANKTEEPSKGEKPVIENLSEFYINSAIYGETKAELDQL 352 

Qy 353 MVPFKTKDSPGVFSKLGV L L RRVT RN L VRN KLAVT T RLLQN L 394 

II II : M : I I : I I : : I : : 

Db 353 PGAQEKKGTSAFKEPVWTSFCHQLRWIARRSFKNLLGNPQASVAQLIVTV 403 

Qy 395 IMGLFL — L FFVLRVRS NVL KGAI Q DRVGLL YQ FVGAT P YT GMLN AVN L F P VLRAVS DQ E 452 

I : I I : : : I I : : : I : I I : I : I : : : I I I I I : : I 

Db 404 ILGLIIGAIYFDLKYDA AGMQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHE 458 

Qy 453 SQDGLYQKWQMMLAYAL-HVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHL 511 

II: : :|| : ::||: : |: III II |: : 
Db 459 YISGYYRVSSYFFGKVMSDLLPMRFLPSVI FTCILYFMLGLKKTVDAFFIMMFTLI M 515 

Qy 512 IGEFLTLVLLGIVQNPNIWSWALLSIAGV — LVGSGFLRNIQEMPIPFKIISYFTFQK 569 

: : : I I : : I : I : : I I I : : I I I I : : : : I I : : 

Db 516 VAYTASSMALAIATGQSWSVATLLMTIAFVFMMLFSGLLVNLRTIGPWLSWLQYFSIPR 575 

Qy 570 YCSEI LWNEFYGLNFTCGS SNVSVTTNPMCAFTQGIQFI EKTCPG 615 

I I I I I I I I : I I I I : I II 
Db 57 6 YGFTALQYNEFLGQEFCPG FNVTDNSTCVNSYAI CTG 612 



RESULT 13 



Q9M3D6 

ID Q9M3D6 PRELIMINARY; PRT; 725 AA. 

AC Q9M3D6; 

DT 01-OCT-2000 (TrEMBLrel. 15, Created) 

DT Ol-OCT-2000 (TrEMBLrel. 15, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE ABC transporter-like protein. 

GN T26I12.10 OR AT3G55130. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=37 02; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Monfort A., Casacuberta E., Puigdomenech P., Mewes H.W., Lemcke K., 

RA Mayer K.F.X., Quetier F. , Salanoubat M. ; 

RL Submitted (NOV-1999) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RA EU Arabidopsis sequencing project; 

RL Submitted (FEB-2000) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. 

RA Yamada K., Liu S.X., Pham P.K., Banh J., Dale J.M., Goldsmith A.D., 

RA Jiang P.X., Lee J.M., Onodera C.S., Quach H.L., Tang C, Toriumi M. , 

RA Yamamura Y. , Yu G., Yu S., Bowser L. , Carninci P., Chen H., Cheuk R. , 

RA Hayashizaki Y., Ishida J., Jones T., Kamiya A., Karlin-Neumann G., 

RA Kawai J., Kim C, Koesema E. , Lam B. , Lin J., Meyers M.C., Miranda M. , 

RA Narusaka M. , Nguyen M. , Palm C.J., Sakurai T., Satou M. , Seki M., 

RA Shinn P., Southwick A., Tracy S.E., Shinozaki K., Davis R.W., 

RA Ecker J.R., Theologis A.; 

RT "Full Length cDNA of gene T2 6112 . 10/AT3g55130 (GI : 7019646) ; 

RL Submitted (JUL-2001) to the EMBL/ GenBank/DDBJ databases. 

RN [4] 

RP SEQUENCE FROM N.A. 

RA Yamada K., Liu S.X., Sakano H . , Pham P.K., Banh J., Etgu P., Lee J.M., 

RA Toriumi M. , Yu G., Brooks S., Chao Q. , Chen H., Karlin-Neumann G. , 

RA Kim C, Lam B., Miranda M. , Nguyen M. , Palm C.J., Shinn P., 

RA Southwick A., Davis R.W. , Ecker J.R., Theologis A. ; 

RT "Arabidopsis Open Reading Frame (ORF) Clones."; 

RL Submitted (FEB-2 002) to the EMBL/ GenBank/DDBJ databases. 

DR EMBL; AL132954; CAB75747.1; -. 

DR EMBL; AY045932; AAK76606.1; -. 

DR EMBL; AY079387; AAL85118.1; -. 

DR PIR; T47652; T47652. 

DR GO; GO: 0016020; Crmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F : ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F: nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC TRANSPORTER^ ; 1. 



DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 
KW ATP-binding. 

SQ SEQUENCE 725 AA; 80656 MW; 790C535A7929CC16 CRC64; 

Query Match 19.5%; Score 649.5; DB 10; Length 725; 

Best Local Similarity 29.4%; Pred. No. 2.1e-39; 

Matches 182; Conservative 124; Mismatches 246; Indels 67; Gaps 15; 

Qy 33 PEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGK 92 

I I : I : Ihll II : : I I I I I I : : I I : I I : I I 

Db 68 PVPYVLNFNNLQYDVTLRRR FGFS RQNGVKTLLDDVS GEASDGDI LAVLGAS GAGK 123 

Qy 93 TTLLDAMSGRLGRAGTFLGEVYVNG-RALRREQFQDCFSYVLQSDTLLSSLTVRETLHYT 151 

: I I : I I :: I I : I : I I : I I : I : : : I I : I I I I I I : I I I : 
Db 124 STLIDALAGRVAE-GSLRGSVTLNGEKVLQSRLLKVI SAYVMQDDLLFPMLTVKETLMFA 182 

Qy 152 ALLAI RRG-NPGS FQKKA/TCAVMAELSLSHVADRLI GNYSLGGI STGERRRVS IAAQLLQD 210 

: : I : : : | | | : : : | | : | : : | | : I : I I I I I I I I I : : I 

Db 183 SEFRLPRSLSKSKKMERVEALIDQLGLRNAANTVIGDEGHRGVSGGERRRVSIGIDIIHD 242 

Qy 211 PKVMLFDEPTTGLDCMTANQIVVLLVELARRNRIVVLTIHQPRSELFQLFDKIAILSFGE 270 

II: lllhlll I : I : I : I : I I : : : I I I I : : : I I : : I I I I : 
Db 243 PIVLFLDEPTSGLDSTNAFMWQVLKRIAQSGSIVIMSIHQPSARIVELLDRLIILSRGK 302 

Qy 271 LIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETS 319 

: I I : I I : I I : I I I I I I : I : I I Ihll 
Db 303 SVFNGSPASLPGFFSDFGRPIPEKENISEFALDLV RELEGSNEGTKALVDFN 354 

Qy 320 KRVQMI ESAYK KSAICHKTL — KNIERMKHLKTLPMVPFKTKD 360 

:: : I : I I : III I : I : 

Db 355 EKWQQNKISLIQSAPQTNKLDQDRSLSLKEAINASVSRGKLVSGSSRSNPTSMETVSSYA 414 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

: I : I : I : I : I : I III: : : I II I : : I I I : I : 

Db 415 NPSLFETF-ILAKRYMKNWIRMPELVGTRIATVMVTGC-LIAT^ 471 

Qy 421 GLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

I: I I I : I: I :| I : :|: |: :::::| I I : :: 

Db 472 -TLFAFWPTMFYCCLDNVPVFIQERYIFLRETTHNAYRTSSYVISHSLVSLPQLLAPSL 530 

Qy 481 I FS SVCYWTLGLHPEVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNIVNS- WALLS I 539 

: I I : : : I I : I I : I : : : I I : : I : I III: : I : : : 

Db 531 VFSAITFWTVGLSGGLEGFVFYCLLIYASFWSGSSWTFISGW— PNIMLCYMVSITYL 588 

Qy 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

11:1111 : I : I : I I I : : : I I I : I 

Db 589 AYCLLLSGFYVNRDRIPFYWTWFHYISILKYPYEAVLINEF DDPS 633 

Qy 600 CAFTQGIQFIEKTCPGATS 618 

I : I : I : I I I 

Db 634 RC FVRGVQVFD S TLLGGVS 652 



RESULT 14 
Q7YYX5 

ID Q7YYX5 PRELIMINARY; PRT; 643 AA. 

AC Q7YYX5; 



DT 01-OCT-2003 (TrEMBLrel. 25, Created) 

DT 01-OCT-2003 (TrEMBLrel . 25, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Putative ABC transporter protein, possible. 

GN 1MB .836. 

OS Cryptosporidium parvum. 

OC Eukaryota; Alveolata; Apicomplexa; Coccidia; Eimeriida; 

OC Cryptosporidiidae; Cryptosporidium. 

OX NCBI_TaxID=5807; 

RN [1] 

RP SEQUENCE FROM N. A. 

RC STRAIN=Iowa; 

RA Bankier A.T., Spriggs H.F., Fartmann B., Konfortov B.A. , Madera M. , 

RA Vogel C, Teichmann S.A. , Ivens A., Dear P.H.; 

RT "Integrated mapping, chromosomal sequencing and sequence analysis of 

RT Cryptosporidium parvum."; 

RL Genome Res. 0:0-0(2003). 

DR EMBL; BX538353; CAD98355.1; -. 

SQ SEQUENCE 643 AA; 72336 MW; 9978B2B42D9809C5 CRC64; 

Query Match 19.2%; Score 639.5; DB 5; Length 643; 

Best Local Similarity 28.2%; Pred. No. 9.8e-39; 

Matches 184; Conservative 134; Mismatches 265; Indels 69; Gaps 19 

Qy 25 LEGAPATA PEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQI 81 

: I I I | : | : : | : : : : : I I : I I : : I : 

Db 6 IESKPVRGSIFPPNADQGVYLAATDISYQI TSGVFEQSTARILSGIKFFAEPKTM 60 

Qy 82 MCILGSSGSGKTTLLDAMSGRLGRAGTFL — GEVYVNGRALRREQFQDCFSYVLQSDTLL 139 

I I I I I I I I I : I I :: I I I I I I I : I : I I : : : : I I I : I : : 
Db 61 TAILGPSGSGKTSLLNILSGRLSSTGNKLVGGSIYLNGKKVTSKDLKSRCSYVMQHEMTI 120 

Qy 140 S SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSHVADRLI GNYSLGGI STGERR 199 

11:1111:11: : : : I I : : : I I I : : I : : Mill: 

Db 121 P YLTI EETLLYSAELRLP FLSAKERREKVRI LLNDLGLVHCMHS IVGDDKVRS I SGGERK 180 

Qy 200 RVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQL 259 

II: : I : I I : :: II I I : I I I I I I : I I : : I I : I : : I I I I I I : : : I I 
Db 181 RVILGTELISDPQILFIDEPTSGLDAFMAFQILQLLIKLAKTGRTIICTIHQPRTQVFQA 240 

Qy 260 FDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDL TSVDTQSKER- 314 

I I : I : I I I I : I : I : I : I : II I II : I I I : I : I I : : I : : I 

Db 241 FDEILLLSKGEVIYQGPSKSSVDYFSLIGYPVPENYNPTDYYLDLLVPRSNVEKFADSRL 300 

Qy 315 E I ET S KRVQMI E S AYKKS AI CHKT LKN I ERMKHL KTLPMVPFKTKDSPGVFS 366 

I :::::: I I : : : I : : I I : I : : : | I 

Db 301 HSITYEQLRVLPELYLSSEYNDRVIRKID — EHLSGQYSPIPELLLFSRSSHTCFGWIRK 358 

Qy 367 KLGVLLRRVTRNLVRNKL-AVITRLLQNLIMGLFL— LFFVLRVRSN VLKGAI 416 

I : : : I I III::: : I I : : : : I I I II II 
Db 359 KLFAFSVIVTCRSFMNNARNTLGSLVIGVLVNAFIAWIGSIFFNLPSFSNDIGITFKNAT 418 

Qy 417 QDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSV 476 

I : : I I I : I : I I : : I III II : I : 

Db 419 NIMGALFFSVMIAT — FGAMIALES FTRFRI I FSRERAKGLYGPATYMLGKHVGDFI FEI 476 



QY 



477 VATMI FS SVCYWTLGLHP EVARFGYFSAALLAPHLI GEFLTLVLLGI VQNPNI 529 



" ' " • .... . I I I I . . 

Db 477 VPILVFSHIFYFMSNTNSVSYPGWNTLTQYLCYQLTILLTSWASYGLVYFICGITKSLEL 536 

Qy 530 WSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGS 589 

: 1:1 I : I I I I : : : I : I I : I I : I I I I I I I I 

Db 537 AYGIAPLIIIFFVIV-SGFYVTVNKLPLWVSWIKYISFQRYSYSALWNTF-PPNQNWGP 594 

Qy 590 SNVSVTTNPMCAFTQGIQF-IEKTCPGATSRFTMNFLILYSFIPALVILGIV 640 

: I I I : : I I : I : : I I : I I I : 

Db 595 IQTDILLK QFSIDQT SFLLNAW LWLGIL 624 



RESULT 15 
Q8RWI9 

ID Q8RWI9 PRELIMINARY; PRT; 691 AA. 

AC Q8RWI9; 

DT 01-JUN-2002 (TrEMBLrel. 21, Created) 

DT 01-JUN-2002 (TrEMBLrel. 21, Last sequence update) 

DT 01-OCT-2003 (TrEMBLrel. 25, Last annotation update) 

DE Hypothetical protein. 

GN AT3G21090. 

OS Arabidopsis thaliana (Mouse-ear cress) . 

OC Eukaryota; Viridiplantae; Streptophyta; Embryophyta; Tracheophyta; 

OC Spermatophyta; Magnoliophyta; eudicotyledons ; core eudicots; rosids; 

OC eurosids II; Brassicales; Brassicaceae; Arabidopsis. 

OX NCBI_TaxID=3702; 

RN [1] 

RP SEQUENCE FROM N.A. 

RA Southwick A., Karlin-Neumann G., Nguyen M. , Lam B., Miranda M. , 

RA Palm C.J., Bowser L., Jones T., Banh J., Carninci P., Chen H., 

RA Cheuk R. , Chung M.K., Hayashizaki Y., Ishida J., Kamiya A., Kawai J., 

RA Kim C, Lin J., Liu S.X., Narusaka M. , Pham P.K., Sakano H., 

RA Sakurai T . , Satou M. , Seki M. , Shinn P., Yamada K., Shinozaki K. , 

RA Ecker J., Theologis A., Davis R.W.; 

RL Submitted (MAR-2 002) to the EMBL/ GenBank/DDBJ databases. 

RN [2] 

RP SEQUENCE FROM N.A. 

RA Nguyen M. , Karlin-Neumann G. , Southwick A., Tripp M. , Miranda M. , 

RA Palm C.J., Bowser L., Jones T., Banh J., Carninci P., Chen H., 

RA Cheuk R., Chung M.K., Hayashizaki Y., Ishida J., Kamiya A., Kawai J., 

RA Kim C, Lin J., Liu S.X., Narusaka M. , Pham P.K., Sakano H., 

RA Sakurai T . , Satou M. , Seki M. , Shinn P., Yamada K., Shinozaki K., 

RA Ecker J., Theologis A., Davis R.W.; 

RL Submitted (SEP-2002) to the EMBL/ GenBank/DDBJ databases. 

CC -!- SIMILARITY: BELONGS TO THE ABC TRANSPORTER FAMILY. 

DR EMBL; AY093054; AAM13053.1; 

DR EMBL; BT000405; AAN15724.1; 

DR GO; GO: 0016020; Crmembrane; IEA. 

DR GO; GO: 0005524; F: ATP binding; IEA. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti. . .; IEA. 

DR GO; GO: 0000166; F:nucleotide binding; IEA. 

DR GO; GO: 0006810; P: transport; IEA. 

DR InterPro; IPR003593; AAA_ATPase . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 



DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Hypothetical protein; ATP-binding; Transport. 

SQ SEQUENCE 691 AA; 77219 MW; CE473CC0B44 0D7E9 CRC64; 

Query Match 18.9%; Score 628; DB 10; Length 691; 

Best Local Similarity 28.1%; Pred. No. 7.6e-38; 

Matches 173; Conservative 123; Mismatches 225; Indels 94; Gaps 17; 

Qy 25 LEGAPATAPE-PHSLGILHASYSVSHRVRPWWDITSCRQQW TRQILKDVSLYVESG 79 

III:: : I I : : I I hi : I I : : I : : : I I I 

Db 3 LEGSSSGRRQLPSKLEMSRGAYLA WEDLTWIPNFSDGPTRRLLQRLNGYAEPG 56 

Qy 80 QIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYWGRALRREQFQDCFSYVLQSDTLL 139 

: I I I : I I I I II : I I I I : : : I I I I I : : I I : I : : I I I I I I 

Db 57 RIMAIMGPSGSGKSTLLDSLAGRLARNVVMTGNLLLNGKKARLD — YGLVAYVTQEDVLL 114 

Qy 140 S S LTVRETLH YTALLAI RRGNPGS FQKK VEAVMAELSLSHVADRLIGNYSLGGIS 194 

: I I I f I I : I : I I : I I : II : II | : I I : I I I : I : I 

Db 115 GTLTVRETITYSAHLRL PSDMSKEEVSDIVEGTIIELGLQDCSDRVT GNWHARGVS 17 0 

Qy 195 TGERRRVSIAAQLLQDPKVTVtLFDEPTTGLDCMTANQIVV^^ 254 

I I I : I I I I I :: I I : : : 1111:111 : I : : I : I I I I : : : I I I I 
Db 171 GGERKRVS IALEI LTRPQI LFLDEPTSGLDSASAFFVIQALRNIARDGRT VT S SVHQPS S 230 

Qy 255 ELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKER 314 

I : I I I I : : I I I I :: I ::|| : |:|||: III:: : I : 

Db 231 EVFALFDDLFLLSSGESVYFGEAKSAVEFFAESGFPCPKKRNPSDHFLRCINSDFDTVTA 290 

Qy 315 EIETSKRVQ MIESAYKKSAICHKTLKNI ERMKHLKTLPMV 354 

: : I : I : I : : I : I I : I I : : : : I I 

Db 291 TLKGSQRIQETPATSDPLMNLATSVIKARLVEN-YKRSKYAKSAKSRIRELSNIEGLEME 349 

Qy 355 PFKTKDSPGVFSKLGVXLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVXRWSNVLKG 414 

I :: : :| I I |: |: ||:: ::: : 
Db 350 IRKGSEATW-WKQLRTLTARSFINMCRDVGYYWTRIISYIWSI 392 

Qy 415 AI QDRVGLL YQ FVGAT P YT GMLNAVN L FPVL RAVSDQESQDG 456 

I I : : I I : I I : I I : II I : I I 

Db 393 SVGT I FYDVGYS - YTS I LARVS CGGFITGFMT FMS I GGFPS FLEEMKVFYKERLS G 447 

Qy 457 LYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL 516 

I :|: : I I I ::| :: I : I : : :| : : I I 

Db 448 YYGVSVYI LSNYI S S FP FLVAI SVITGTIT YNLVKFRPGFSHYAFFCLNI FFSVSVI ESL 507 

Qy 517 TLVLLGIVQNPNIVNSWALLSIAG-VLVGSGFLRNIQEMPIPFKI ISYFTFQKY 570 

: I : : I II: : : : I : : : I I I I : : : I II : I I : : : 

Db 508 MMWASW— PNFLMGLITGAGLIGIIMMTSGFFRLLPDLP KIFWRYPVSYISYGSW 562 

Qy 571 CSEILWNEFYGLNF 585 

: I : I I I I 

Db 563 AIQGGYKNDFLGLEF 577 



Search completed: February 27, 2004, 07:15:28 
Job time : 37.1394 sees 



GenCore version 5.1.6 
Copyright (c) 1993 - 2004 Compugen Ltd. 



OM protein - protein search, using sw model 

Run on: February 27, 2004, 06:40:43 ; Search time 10.0797 Seconds 

(without alignments) 
3362.970 Million cell updates/sec 

Title: US-09-98 9-981A-6 

Perfect score: 3326 

1 MGDLS S LT PGGSMGLQWRG PAL VI LGI WFKI RDHLI S R 651 



Sequence : 
Scoring table: 



BLOSUM62 

Gapop 10.0 , Gapext 0.5 



Searched: 141681 seqs, 52070155 residues 

Total number of hits satisfying chosen parameters: 

Minimum DB seq length: 0 

Maximum DB seq length: 2000000000 

Post-processing: Minimum Match 0% 

Maximum Match 100% 
Listing first 45 summaries 



141681 



Database 



SwissProt 42:* 



Pred. No. is the number of results predicted by chance to have a 
score greater than or equal to the score of the result being printed, 
and is derived by analysis of the total score distribution. 
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ALIGNMENTS 



RESULT 1 
ABG5_HUMAN 

ID ABG5_HUMAN STANDARD; PRT; 651 AA. 

AC Q9H222; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub- family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A. , AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H . , Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B . , Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

RN [2] 

RP SEQUENCE FROM N.A., VARIANTS SITOSTEROLEMIA HIS-389; HIS-419 AND 



RP PRO-419, AND VARIANT GLU-604. 

RC TISSUE=Liver; 

RX MEDLINE=2 0578753; PubMed= 11138003; 

RA Lee M.-H., Lu K., Hazard S., Yu H., Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R., Sakuma N., Pegoraro R., Srivastava A.K., Salen G., 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5 , important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed=11590207 ; 

RA Schmitz G. , Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

RN [4] 

RP VARIANTS SITOSTEROLEMIA GLN-146; HIS-389; PRO-419; HIS-419 AND 

RP SER-550, AND VARIANT GLU-604. 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A. , Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A. , Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B. ; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 

CC in the small intestine and colon. 

CC -!- DISEASE: Defects in ABCG5 are a cause of sitosterolemia 

CC [MIM:210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY : Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by - non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 
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SEQUENCE 


651 AAj 


; 72503 MW; , 950BABFCBB6A1536 CRC64 ; 



Query Match 100.0%; Score 3326; DB 1; Length 651; 

Best Local Similarity 100.0%; Pred. No. 1.8e-230; 

Matches 651; Conservative 0; Mismatches 0; Indels 0; Gaps 0; 

Qy 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I I I I I I I I I I I I I I 
Db 1 MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVRPWWDITSC 60 

Qy 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 61 RQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRAL 120 



Qy 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 18 0 

I I I I I I I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 121 RREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSHV 180 

Qy 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

I I I I I I I I I I I II I I I I I I I I I I I I I I I I I I I I I I I 1 I I I I I I I I I I II I I I I I I I I I I I 
Db 181 ADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELAR 240 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

Qy 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 42 0 

I I I I I II I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II I I I I I I I I I 
Db 361 SPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRV 420 

Qy 421 GLL YQ FVGAT P YT GMLNAVN L F P VL RAVS DQESQDGLYQ KWQMMLAYALHVL P F S WATM 480 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 421 GLLYQFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATM 480 

Qy 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 481 IFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIA 540 

Qy 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 541 GVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMC 600 

Qy 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I 
Db 601 AFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 



STANDARD; 



PRT; 



(Rel. 41, 
(Rel. 41, 
(Rel. 41, 



Created) 

Last sequence update) 
Last annotation update) 
sub-family G, member 5 



RESULT 2 
ABG5_MOUSE 
ID ABG5_MOUSE 
AC Q99PE8; 
DT 28-FEB-2003 
DT 28-FEB-2003 
DT 28-FEB-2003 
DE ATP-binding cassette, 
GN ABCG5 . 

OS Mus musculus (Mouse) . 
OC Eukaryota; Metazoa; Chordata; 
OC Mammalia; Eutheria; 
OX NCBI_TaxID=10090; 
RN [1] 

RP SEQUENCE FROM N. A. 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=20578753; PubMed=11138 003 ; 

RA Lee M.-H., Lu K. , Hazard S., Yu H., 

RA Allikmets R. , Sakuma N . , Pegoraro R 



652 AA. 



(Sterolin-1) 



Craniata; Vertebrata; Euteleostomi ; 
Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 



Shulenin S., Hidaka H., Kojima H., 
, Srivastava A.K., Salen G. , 



RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP TISSUE SPECIFICITY, AND INDUCTION. 

RX MEDLINE=20553648; PubMed-11099417 ; 

RA Berge K.E., Tian H. , Graf G.A., Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B. f Barnes R. , Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 
CC level, in the liver. 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoic X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF312713; AAG53097.1; - . 

DR MGD; MGI: 1351659; Abcg5. 

DR InterPro; IPR003593; AAA_AT Pa s e . 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Glycoprotein; Transmembrane; Transport. 
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FT 


CARBOHYD 


592 


592 


N-LINKED (GLCNAC. 


. .) (POTENTIAL) 


SQ 


SEQUENCE 


652 AA; 


73244 MW 


; 80CE37ADCC19771E 


CRC64; 


Query Match 




82.3%; 


Score 2738.5; DB 


1; Length 652; 



Best Local Similarity 80.1%; Pred. No. 2.1e-188; 

Matches 522; Conservative 64; Mismatches 65; Indels 1; Gaps 1; 

MGDLSSLTPGGSMGLQVNRGSQSSLEGAPATAPEP-HSLGILHASYSVSHRVRPWWDITS 59 

I I : I I : I I : I : I I I I I I I I I I I I I I : I I I I I I I : I I I I I : I I 

MGELPFLSPEGARGPHINRGSLSSLEQGSVTGTEARHSLGVLHVSYSVSNRVGPWWNIKS 60 

CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I I I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 
CQQKWDRQILKDVSLYIESGQIMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

LRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQKKVEAVMAELSLSH 179 

I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I : I I : I : • I I M I I I I II I I I 
LRRDQFQDCFSYVLQSDVFLSSLTVRETLRYTAMLALCRSSADFYNKKVEAVMTELSLSH 18 0 

VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

I I I : : I I : I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I I : I I III 
VADQMIGSYNFGGISSGERRRVSIAAQLLQDPKVMMLDEPTTGLDCMTANQIVLLLAELA 24 0 

RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

II : I I I : : I I I I I I I I I I I I I I I I I I :: I I I : I I I I I III I I I : I I I I I I I I I I I I I 
RRDRIVIVTIHQPRSELFQHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 

FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 

I I I I I I I I I I I I : I I I I I I I I I I I : I I : I : I I II I : I I I I - I I I I I I I I I I I 
FYMDLTSVDTQSREREIETYKRVQMLECAFKESDIYHKILENIERARYLKTLPTVPFKTK 360 

DSPGVFSKLGVTjLRRWRNLVr^KLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

I I I : I I I I I I I I I I I I I I : I I I III I I : I I I I I I I I I : I : : I I I : : I I I I I : I I I 
DPPGMFGKLGVLLRRVTRNLMRNKQAVIMRLVQNLIMGLFLIFYLLRVQNNTLKGAVQDR 420 

VGLL YQ FVGAT P YT GMLNAVN L F PVL RAVS DQE S Q D GL YQ KWQMMLAYALHVL P FS WAT 479 

I I I I I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I lllhlll I II I I I II : I I 
VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYHKWQMLLAYVLHVLPFSVIAT 480 

MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSI 539 
: II M I I I M I I I : I M I I I I M I I II I M M I I I I M I I I I I I I I I I I I I I I : I I I I I I 
VI FS SVCYWTLGLYPEVARFGYFSAALLAPHLI GEFLTLVLLGIVQNPNI VNS I VALLS I 540 

AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I I I : I I I I I I I I I I I : I I I I I I I I I I I I I II I I I I I I I I I II I : : I I 
SGLLIGSGFIRNIQEMPIPLKILGYFTFQKYCCEILWNEFYGLNFTCGGSNTSMLNHPM 600 

C AFTQ GIQFIEKTCP GAT S RFTMN FLILYSFI PAL VI LG I WFK I RDH L I S R 651 

II I I I : I I I I I I I I I I I I I I I I I I I I I I I I I I I I I I II : I I : I I : I I I I 
CAITQGVQFIEKTCPGATSRFTANFLILYGFIPALVILGIVIFKVRDYLISR 652 
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RESULT 3 



ABG5 RAT 



ID ABG5_RAT STANDARD; PRT; 652 AA. 

AC Q99PE7; Q8CIQ4; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 10-OCT-2003 (Rel. 42, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 5 (Sterolin-1) . 

GN ABCG5 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Sprague-Dawley; TISSUE=Small intestine; 

RX MEDLINE-2 0578753; PubMed=11138 003 ; 

RA Lee M.-H., Lu K., Hazard S., Yu H., Shulenin S., Hidaka H., Kojima H., 

RA Allikmets R., Sakuma N. , Pegoraro R., Srivastava A.K., Salen G. , 

RA Dean M. , Patel S.B.; 

RT "Identification of a gene, ABCG5, important in the regulation of 

RT dietary cholesterol absorption."; 

RL Nat. Genet. 27:79-83(2001). 

RN [2] 

RP REVISION TO 2. 

RA Lu K., Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2002) to the EMBL/GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A., TISSUE SPECIFICITY, AND VARIANT CYS-583. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RX PubMed=l 2 783625; 

RA Yu H., Pandit B., Klett E., Lee M.H., Lu K. , Helou K., Ikeda I., 

RA Egashira N . , Sato M. , Klein R. , Batta A., Salen G . , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardiovasc. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG8 or be tightly coupled to 
CC ABCG8 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- TISSUE SPECIFICITY: Expressed only in liver and intestine. 

CC -!- POLYMORPHISM: The polymorphism at position 583 is found in strains 

CC SHR, SHRSP and Wistar Kyoto which are both hypertensive and 

CC sitosterolemic. Strains which are hypertensive but not 

CC sitosterolemic do not contain a polymorphism at this position. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 

CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 



CC or send an email to license@isb-sib.ch) 

CC 

DR EMBL; AF312714; AAG53098.3; -. 

DR EMBL; AY145899; AAN64275.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC transporter; 1. 



DR 


SMART; SM00382; AAA; 1. 




DR 


PROSITE; 


PS00211; 


AB C_T RAN S PORT ER_1 ; 1 . 


DR 


PROSITE; 


PS50893; 


ABC TRANSPORTER 2; 1. 


KW 


ATP-binding; Glycoprotein 


; Transmembrane; Transport; Polymorphism. 
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VARIANT 
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G -> C (in strains SHR, SHRSP and Wistar 


FT 








Kyoto) . 


SQ 


SEQUENCE 


652 AA; 


73372 


MW; 49FEF7372269299D CRC64; 



Query Match 81.8%; Score 2721.5; DB 1; Length 652; 

Best Local Similarity 79.3%; Pred. No. 3.4e-187; 

Matches 517; Conservative 68; Mismatches 66; Indels 1; Gaps 1; 

Qy 1 MGDLSS LTPGGSMGLQVNRGSQS SLEGAPATAPEP-HSLGI LHAS YSVSHRVRPWWDITS 59 

I : I I : I I : I I I I I I I I I I I I I I I I : I : I : I I I : I I I I I : I I 

Db 1 MSELPFLSPEGARGPHNNRGSQSSLEEGSWGSEARHSLGVLNVSFSVSNRVGPWWNIKS 60 

Qy 60 CRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRA 119 

I : I : I I : I I I I I I I I : I I I I I I I I I I I I I I I I I I I I I : I I I I I II I I I : I I I 
Db 61 CQQKWDRKILKDVSLYIESGQTMCILGSSGSGKTTLLDAISGRLRRTGTLEGEVFVNGCE 120 

Qy 120 LRREQFQDCFS YVLQSDTLLS SLTVRETLHYTALLAI RRGNPGS FQKKVEAVMAELSLSH 179 

I I I : I I I I I Ihllll I I I I I I I I I I I I I : I I : I : : I I I I I I : I I I I I I 
Db 121 LRRDQFQDCVS YLLQSDVFLS SLTVRETLRYT7UVIIALRS SS7VDFYDKKVEAVXTELS LSH 18 0 

Qy 180 VADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELA 239 

I I I : : I I I I : I I I I : I I I I I I I I I I I I I I I I I I I : I I I I I I I I I I I I I I I : I I I I I I 
Db 181 VADQMI GNYNFGGI S SGERRRVS IAAQLLQDPKVMMLDEPTTGLDCMTANHIVLLLVELA 24 0 

Qy 240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

I I I I I I :: I I I I I I I I I I II I I I II : : I I I : I I I I I III I I I : I I I I I I I I I I I I I 
Db 241 RRNRIVIVTIHQPRSELFHHFDKIAILTYGELVFCGTPEEMLGFFNNCGYPCPEHSNPFD 300 



Qy 



300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 



I I I I I I I I I I I I : I I I I I I lllll:|ll:::l MM MUM MUM 

Db 301 FYMDLTSVDTQSREREIETYKRVQMLESAFRQSDICHKILENIERTRHLKTLPMVPFKTK 360 

Qy 360 DSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDR 419 

: I I : I I M M M M M M : M I M IMII! : I : : I I I : : I : I I I I : I I I 

Db 361 NPPGMFCKLGVLLRRVTRNLMRNKQWIMRLVQNLIMGLFLIFYLLRVQNNMLKGAVQDR 420 

Qy 420 VGL L YQ FVGAT P YT GMLNAVN L F P VL RAVS DQE S Q D GL YQ KWQMMLAYALHVL P F S WAT 479 

M M I I I I i Mill I I II: MM i I I I : I I I M M M M M 

Db 421 VGLLYQLVGATPYTGMLNAVNLFPMLRAVSDQESQDGLYQKWQMLLAYVLHALPFSIVAT 480 

Qy 480 MIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIWSWALLSI 539 

: I I I I I I I I I I I I : II II I I M I M I II I I II I II I I M M I M I I I I I I I I M I I M I I 
Db 481 VIFSSVCYWTLGLYPEVARFGYFSAALLAPHLIGEFLTLVLLGMVQNPNIVNSIVTU^LSI 540 

Qy 540 AGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPM 599 

: I : I : I I ! I : I M : I I I I I M : M M M M M M M M M M M I I M M Ml 
Db 541 SGLLI GSGFI RNI EEMPI PLKI LGYFTFQKYCCEI LWNEFYGLNFTCGGSNTSVPNNPM 600 

Qy 600 CAFTQGIQFIEKTCPGATSRFTMNFLILYSFIPALVILGIWFKIRDHLISR 651 

I : M M M M M M M M M I M M M M M I I II I M I I I M I M M I 

Db 601 CSMTQGIQFIEKTCPGATSRFTTNFLILYSFIPTLVILGMWFKVRDYLISR 652 



RESULT 4 
ABG8_MOUSE 

ID ABG8_MOUSE STANDARD ; PRT; 673 AA. 

AC Q9DBM0; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Mus mus cuius (Mouse) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Mus. 

OX NCBI_TaxID=10090; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2). 

RC STRAIN=C57BL/6; TISSUE=Liver ; 

RX MEDLINE=2 1344600; PubMed=l 1452359; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 
RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1). 

RC STRAIN=C57BL/6J; TISSUE=Liver ; 

RX MEDLINE=21085660; PubMed=11217851; 

RA Kawai J., Shinagawa A., Shibata K., Yoshino M. , Itoh M. , Ishii Y., 

RA Arakawa T., Hara A., Fukunishi Y . , Konno H., Adachi J., Fukuda S., 

RA Aizawa K., Izawa M. , Nishi K. , Kiyosawa H . , Kondo S., Yamanaka I., 

RA Saito T., Okazaki Y. , Gojobori T., Bono H., Kasukawa T., Saito R. , 



RA Kadota K. r Matsuda H.A., Ashburner M. , Batalov S., Casavant T . , 

RA Fleischmann W., Gaasterland T., Gissi C, King B., Kochiwa H., 

RA Kuehl P., Lewis S., Matsuo Y., Nikaido I., Pesole G., Quackenbush J., 

RA Schriml L.M., Staubli F. , Suzuki R. , Tomita M. , Wagner L. , Washio T., 

RA Sakai K. , Okido T., Furuno M., Aono H., Baldarelli R. , Barsh G. , 

RA Blake J., Boffelli D., Bojunga N. , Carninci P., de Bonaldo M.F., 

RA Brownstein M.J., Bult C, Fletcher C, Fujita M., Gariboldi M. , 

RA Gustincich S., Hill D., Hofmann M., Hume D. A. , Kamiya M., Lee N.H., 

RA Lyons P., Marchionni L., Mashima J., Mazzarelli J. , Mombaerts P., 

RA Nordone P., Ring B., Ringwald M., Rodriguez I., Sakamoto N., 

RA Sasaki H. , Sato K. , Schoenbach C. , Seya T. , Shibata Y., Storch K.-F., 

RA Suzuki H., Toyo-oka K., Wang K.H., Weitz C, Whittaker C, Wilming L., 

RA Wynshaw-Boris A., Yoshida K., Hasegawa Y., Kawaji H., Kohtsuki S., 

RA Hayashizaki Y.; 

RT "Functional annotation of a full-length mouse cDNA collection."; 

RL Nature 409:685-690(2001). 

RN [3] 

RP TISSUE SPECIFICITY, AND INDUCTION. 

RX MEDLINE=20553648; PubMed=11099417 ; 

RA Berge K.E., Tian H. , Graf G.A., Yu L. f Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B., Barnes R., Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms-2; 

CC Name=l; 

CC IsoId=Q9DBM0-l; Sequence=Displayed; 

CC Name=2 ; 

CC IsoId=Q9DBM0-2; Sequence=VSPJD00053 ; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Expressed in the intestine and, at lower 

CC level, in the liver . 

CC -!- INDUCTION: Upregulated by cholesterol feeding. Possibly mediated 
CC by the liver X receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib. ch) . 

CC 

DR EMBL; AF324495; AAK84079.1; -. 

DR EMBL; AK004871; BAB23630.1; 



DR 


MGD; MGI 


:1914720; 


Abcg8. 






DR 


InterPro 


; IPR003439; ABC 


transporter . 


DR 


Pfam; PF00005; ABC tran; 


1. 




DR 


ProDom; : 


PD000006; 


ABC_transporter; 1 . 


DR 


PROSITE; 


PS00211; 


ABC_TRANSPORTER__l; 1. 


DR 


PROSITE; 


PS50893; 


ABC TRANSPORTER 2; 1. 


KW 


Glycoprotein; Transmembrane; 


Transport; Alternative splicing. 
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FT 


DOMAIN 


548 


569 




CYTOPLASMIC (POTENTIAL) . 
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FT 


TRANSMEM 


640 


660 




6 (POTENTIAL) . 
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Missing (in isoform 2) . 
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/FTId=VSP_000053. 


SQ 


SEQUENCE 


673 AA; 


75995 


MW 


; 78012611A5DF2589 CRC64 ; 



Query Match 21.0%; Score 698; DB 1; Length 673; 

Best Local Similarity 28.7%; Pred. No. 2.5e-42; 

Matches 194; Conservative 133; Mismatches 264; Indels 84; Gaps 18; 

QY 11 GSMGLQVNRGSQSSLEGAPATAPEPHSLGILHASYSVSHRVR PWWD-I 57 

I : : : : I I II : : : I I : : I : I I I I : : : 

Db 14 GTVLQDASQGLQDSL FSSESDNSLYFTYSGQSNTLEVRDLTYQVDIASQVPWFEQL 69 

QY 58 TSCRQQWTRQI LKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTF 109 

: I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I 

Db 7 0 AQFKI PWRSHS SQDSCELGI RNLS FKVRSGQMLAI I GS SGCGRASLLDVITGR-GHGGKM 128 

Qy HO -LGEVTWGR7VLRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI-RRGNPGSFQKK 167 

I : : : : I I : : : I : : I I I I I : I II I I I I : I : : I : I : 

Db 129 KSGQIWINGQPSTPQLVRKCVAHVRQHDQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKR 188 

Qy 168 VEAVKAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMT 227 

111:1111 I : : II : I : I I I I I I I I I I I I : I : : : I I I I : I I I I 
Db 189 VEDVIAELRLRQCANTRVGNTYVRGVSGGERRRVSIGVQLLWNPGILILDEPTSGLDSFT 24 8 

Qy 228 ANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDC 287 

I : : I I I I : I I : I : : : : I I I I I : : I : I I I : : : : I I : I : I : : I 
Db 249 AHNLVTTLSRLAKGNRLVLISLHQPRSDIFRLFDLVLLMTSGTPIYLGAAQQMVQYFTSI 308 

Qy 288 GYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA ICHKTLKN 341 

I : I I I : I I I I I I : I I I I : I : I I I I I : I : : I : : : : : I 

Db 309 GHPCPRYSNPADFWDLTSIDRRSKEREVATV^KAQSIAALFLEKVQGFDDFLWKAEAKE 368 

Qy 342 IERMKHLKTLPMVPFKTKDS P GVF S KL G VL L RRVT RN L VRN KLAVI T RL LQN 393 

: I : I : I : I : I I : : I : I I I I : : : : 

Db 369 LNTSTHTVSLTL TQDTDCGTAVELPGMIEQFSTLIRRQISNDFRDLPTLLIHGSEA 424 



Qy 394 L I MGL FL L F FVL RVRS NVL KGAI QDRVGL L YQ FVGAT P YT GMLNAVN L F P VL RAVS DQ E S 453 

: | | : | : | : | I I : I : : I : I : I : : I 

Db 425 CLMSLIIGFLYYGHGAKQL — SFMDTAALLFMIGALIPFNVILDWSKCHSERSMLYYEL 482 

Qy 4 54 QDGLYQKWQMMLAYALHVLPFSVVATMIFSSVCYWTLGLHPEVARFGYFSA^ 513 

:|||| I I II :|:: I I I I I III: 
Db 483 EDGLYTAGPYFFAKILGELPEHCAYVIIYAMPIYWLTNLRPVPELF LL — HFLL 534 

Qy 514 EFL TLVLLGIVQNPNI-VNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISYFT 566 

:| |:| I ::| : :lh I: : I I I : 

Db 535 WLVVFCCRTMALAAS7\MLPTFHMSS FFCNALYNS FYLTAGFMINLDNLWI VPAWI SKLS 594 

Qy 567 FQKYCSEILWNEFYGLNFT — CGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 624 

I : : I I: :| I :| h I : : I ' ; l 

Db 595 FLRWCFSGLMQIQFNGHLYTTQIGNFTFSILGDTM ISAMDLNS 637 

Qy 625 LI LYS FI PAL VI LGI 639 

II: I:::ll 
Db 638 HPLYAIY— LIVIGI 650 



RESULT 5 
ABG8_HUMAN 

ID AB G 8 _HUMAN STANDARD; PRT; 673 AA. 

AC Q9H221; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE FROM N.A., VARIANTS SITOSTEROLEMIA THR-231; GLN-263; ARG-574 

RP AND ARG-596, AND VARIANT CYS-54 . 

RX MEDLINE=20553648; PubMed=110994 17 ; 

RA Berge K.E., Tian H . , Graf G.A. , Yu L., Grishin N.V., Schultz J., 

RA Kwiterovich P., Shan B . , Barnes R., Hobbs H.H.; 

RT "Accumulation of dietary cholesterol in sitosterolemia caused by 

RT mutations in adjacent ABC transporters."; 

RL Science 290:1771-1775(2000). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2), VARIANTS SITOSTEROLEMIA 

RP HIS-184; THR-231; GLN-263; HIS-405; PRO-501; SER-543; PRO-572; 

RP GLU-574; ARG-574; ARG-596 AND PHE-570 DEL, AND VARIANTS HIS-19; 

RP CYS-54; LYS-238; VAL-259; LYS-400; ARG-575 AND ALA- 6 32. 

RC TISSUE=Liver; 

RX MEDLINE=21344 600; PubMed-11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E., 

RA Pandya A., Brewer H.B. Jr., Salen G., Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia: genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 



RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [3] 

RP REVIEW. 

RX MEDLINE=2 1474438 ; PubMed=11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=2; 

CC Name=l; 

CC IsoId=Q9H221-l; Sequence=Displayed; 

CC Name =2 ; 

CC IsoId=Q9H221-2; Sequence=VSP_000052 ; 

CC Note=Minor form detected in approximately 10% of the cDNA 

CC " clones; 

CC -!- TISSUE SPECIFICITY: Strongly expressed in the liver, lower levels 
CC in the small intestine and colon. Detectable in a wide variety of 

CC human tissues. 

CC -!- DISEASE: Defects in ABCG8 are a cause of sitosterolemia 

CC [MIM: 210250] ; also known as phytosterolemia or shellfish 

CC sterolemia. It is a rare autosomal recessive disorder 

CC characterized by increased intestinal absorption of all sterols 

CC including cholesterol, plant and shellfish sterols, and decreased 

CC biliary excretion of dietary sterols into bile. Sitosterolemia 

CC patients have hypercholesterolemia, very high levels of plant 

CC sterols in the plasma, and frequently develop tendon and tuberous 

CC xanthomas, accelerated atherosclerosis and premature coronary 

CC artery disease. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

cc 7" 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF320294; AAG40004.1; -. 

DR EMBL; AF324494; AAK84078.1; -. 

DR EMBL; AF351824; AAK84663.1; -. 

DR EMBL; AF351812; AAK84663.1; JOINED. 

DR EMBL; AF351813; AAK84663.1; JOINED. 

DR EMBL; AF351814; AAK84663.1; JOINED. 

DR EMBL; AF351815; AAK84663.1; JOINED. 

DR EMBL; AF351816; AAK84663.1; JOINED. 

DR EMBL; AF351817; AAK84663.1; JOINED. 



DR 


EMBL; AF351818; 


AAK84663. 


1; 


JOINED. 


DR 


EMBL; AF351819; 


AAK84663 . 


l; 


JOINED. 


DR 


EMBL; AF351820; 


AAK84663. 


1; 


JOINED. 


DR 


EMBL; AF351821; 


AAK84663. 


1; 


JOINED. 


DR 


EMBL; AF351822; 


AAK84663. 


l; 


JOINED. 


DR 


EMBL; AF351823; 


AAK84663. 


l; 


JOINED. 


DR 


Genew; HGNC: 


13887; 


ABCG8. 






DR 


MIM; 605460; 












DR 


MIM; 210250; 












DR 


InterPro; IPR003439; ABC_ 


transporter . 


DR 


Pfam; PF00005; ABC_tran; 


1. 




DR 


ProDom; PD000006; 


ABC transporter; 1. 


DR 


PROSITE; PS00211; 


AB C_T RAN S PORT ER_1 ; 1. 


DR 


PROSITE; PS50893; 


ABC TRANSPORTER 2; 1. 


KW 


Glycoprotein 


.; Transmembrane; 


Transport; Alternative splicing; 


KW 


Polymorphism; Disease mutation. 
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FT 


DOMAIN 


469 




492 




CYTOPLASMIC (POTENTIAL) . 
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6 (POTENTIAL) . 
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CYTOPLASMIC (POTENTIAL) . 


FT 


CARBOHYD 


619 




619 




N-LINKED ( GLCNAC . . .) (POTENTIAL! 


FT 


VARSPLIC 


376 




376 




Missing (in isoform 2). 


FT 












/FTId=VSP 000052. 


FT 


VARIANT 


19 




19 




D -> H. 


FT 












/FTId=VAR_012250 . 


FT 


VARIANT 


54 




54 




Y -> C. 


FT 












/FTId=VAR_012251 . 


FT 


VARIANT 


184 




184 




R -> H (in sitosterolemia) . 


FT 












/FTId=VAR_0122 52 . 


FT 


VARIANT 


231 




231 




P -> T (in sitosterolemia) . 


FT 












/ FTId=VAR 012253. 


FT 


VARIANT 


238 




238 




E -> K. 


FT 












/FTId=VAR_012254 . 


FT 


VARIANT 


259 




259 




A -> V. 


FT 












/FTId=VAR_012255 . 


FT 


VARIANT 


263 




2 63 




R -> Q (in sitosterolemia) . 


FT 












/FTId=VAR_012256. 


FT 


VARIANT 


400 




400 




T -> K. 


FT 












/FTId=VAR_012257. 


FT 


VARIANT 


405 




405 




R -> H (in sitosterolemia) . 


FT 












/FTId=VAR_012258 . 


FT 


VARIANT 


501 




501 




L -> P (in sitosterolemia) . 


FT 












/FTId=VAR_012259 . 


FT 


VARIANT 


543 




543 




R -> S (in sitosterolemia) . 


FT 












/FTId=VAR_012260. 


FT 


VARIANT 


570 




570 




Missing (in sitosterolemia) . 


FT 












/FTId=VAR_0122 61. 


FT 


VARIANT 


572 




572 




L -> P (in sitosterolemia) . 



FT /FTId=VAR_012262 . 

FT VARIANT 574 574 G -> E (in sitosterolemia) . 

FT /FTId=VAR_012263. 

FT VARIANT 574 574 G -> R (in sitosterolemia) . 

FT /FTId=VAR_012264 . 

FT VARIANT 575 575 G -> R. 

FT /FTId=VAR__012265. 

FT VARIANT 596 596 L -> R (in sitosterolemia) . 

FT /FTId=VAR_012266. 

FT VARIANT 632 632 V -> A. 

FT /FTId=VAR_0122 67 . 

SQ SEQUENCE 673 AA; 75678 MW; 594AFD1D6C1BB50F CRC64; 



Query Match 21.0%; Score 697; DB 1; Length 673; 

Best Local Similarity 28.9%; Pred. No. 2.9e-42; 

Matches 187; Conservative 124; Mismatches 241; Indels 96; Gaps 16; 

Qy 8 TPGGSMGLQVNRGSQS SLEGAPAT-APEPHSLGI LHAS YSVSHRVR- PWWD- ITSCRQQW 64 

II : I I I I I | : : | : : | : : | | : I I : : : : I 

Db 16 TPQDTSGLQDRLFSSESDNSLYFTYSGQPNTLEVRDLNYQVDLASQVPWFEQLAQFKMPW 75 

Qy 65 TRQI LKDVSL YVESGQIMCI LGS S GS GKTTLLDAMS GRLGRAGT F- LGEVYV 115 

I : : : : I I I I I : : I : I I I I I : : I I I : : I I I I I : : : : 

Db 76 TSPSCQNSCELGIQNLSFKVRSGQMLAIIGSSGCGRASLLDVITGR-GHGGKIKSGQIWI 134 

Qy 116 NGRALRREQFQDCFSYVXQSDTLLSSLTVRETLHYTAiLAI-RRGNPGSFQKKVEAVM7\E 174 

II: : : I :: I I : I I : I I I I I I I : I : : I : I : II I : I I 

Db 135 NGQPSSPQLVRKCVAHVRQHNQLLPNLTVRETLAFIAQMRLPRTFSQAQRDKRVEDVIAE 194 

Qy 175 LSLSHVADRLI GNYSLGGI STGERRRVS IAAQLLQDPKVMLFDEPTTGLDCMTANQI WL 234 

II II : I I : I : I I I I I I I I I I I I : I : : : I I I I : I I I I I : : I 

Db 195 L RL RQ C ADT RVGNMYVRGL S GGE RRRVS I G VQ L LWN PGILILDEPTSGLDS FT AHN LVKT 254 

Qy 235 LVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEH 294 

I I I : | I : I : : : : I I I I I : : I : I I I : : : : I I : I I : : I INN: 
Db 255 L S RLAKGN RLVL I S LHQ PRS DI FRL FDLVLLMT S GT P I YLGAAQHMVQ YFT AI G YP C P RY 314 

Qy 295 SNP FDFYMDLT S VDTQS KEREI ET S KRVQMI ESAYKKS AI CHKTLKNI ERMKHL 348 

I I I I I I : I I I I : I : I : I : I : I : : I : : : : | : : : I 

Db 315 SNPADFYVDLTSIDRRSREQELATREKAQSLAALF LEKVRDLDDFLWK 362 

Qy 349 KTLPM VP FKT KD S P GVF S KLGVL L RRVT RN L VRN KLAVI T RL 390 

I : : I I I I I : I : I I I I : :: 
Db 363 AETKDLDEDTCVESSVTPLDTNCLPSPTK-MPGAVQQFTTLIRRQISNDFRDLPTLLIHG 421 

Qy 391 LQNLIMGLFLLFFVLRVRSNVLKGAIQ D RVGL L YQ FVGAT P YT GMLN AVN L FP VL R 446 

: : I : : I I : I I I II: I : : I : :: I 

Db 422 AEACLMSMTIGFLYFG HGS IQLS FMDTAALLFMIGALI PFNVI LDVI SKCYSER 475 

Qy 447 AVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARF 499 

I : I : I I I I I I II : I : II I I : I 

Db 476 AMLYYELEDGLYTTGPYFFAKILGELPEHCAYIIIYGMPTYWLANLRPGLQPFLLHFLLV 535 

Qy 500 GYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFL 549 

: I I I I : I : I : : : I I II.: 
Db 536 WLWFCCRIMALAAAALLPTFHMASFFS NALYNSFYLAG GFM 577 



Qy 550 RNIQEMPI PFKI I S YFTFQKYCSEILVVNEFYGLNFTCGSSNVSVTTN 597 

I: : || :| ::| | |: :| : |::: : 

Db 578 INLSSLWTVPAWISKVS FLRWCFEGLMKIQFSRRTYKMPLGNLTIAVS 625 



RESULT 6 
ABG8 RAT 



ID ABG8_RAT STANDARD; PRT; 694 AA. 

AC P58428; Q8CIQ5; Q923R7; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 15-MAR-2004 (Rel. 43, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub- family G, member 8 (Sterolin-2) . 

GN ABCG8 . 

OS Rattus norvegicus (Rat) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Rodentia; Sciurognathi ; Muridae; Murinae; Rattus. 

OX NCBI_TaxID=10116; 

RN [1] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 2) . 

RC STRAIN=Sprague-Dawley; 

RX MEDLINE=21344600; PubMed=11452359 ; 

RA Lu K., Lee M.-H., Hazard S., Brooks-Wilson A., Hidaka H., Kojima H., 

RA Ose L., Stalenhoef A.F.H., Mietinnen T., Bjorkhem I., Bruckert E . , 

RA Pandya A., Brewer H.B. Jr., Salen G. , Dean M. , Srivastava A.K., 

RA Patel S.B.; 

RT "Two genes that map to the STSL locus cause sitosterolemia : genomic 

RT structure and spectrum of mutations involving sterolin-1 and 

RT sterolin-2, encoded by ABCG5 and ABCG8, respectively."; 

RL Am. J. Hum. Genet. 69:278-290(2001). 

RN [2] 

RP REVISIONS TO 3-4. 

RA Lu K., Yu H . , Lee M.-H., Patel S.B.; 

RL Submitted (AUG-2002) to the EMBL/ GenBank/DDBJ databases. 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORMS 1 AND 3), AND TISSUE SPECIFICITY. 

RC STRAIN=GH, SHR, SHRSP, Sprague-Dawley, Wistar, Wistar Kyoto, and WKA; 

RC TISSUE=Intestine, and Liver; 

RX PubMed= 12783625; 

RA Yu H., Pandit B., Klett E . , Lee M.-H., Lu K., Helou K. , Ikeda I., 

RA Egashira N . , Sato M. , Klein R. , Batta A., Salen G. , Patel S.B.; 

RT "The rat STSL locus: characterization, chromosomal assignment, and 

RT genetic variations in sitosterolemic hypertensive rats."; 

RL BMC Cardiovasc. Disord. 3:4-4(2003). 

CC -!- FUNCTION: Transporter that appears to play an indispensable role 
CC in the selective transport of the dietary cholesterol in and out 

CC of the enterocytes and in the selective sterol excretion by the 

CC liver into bile. 

CC -!- SUBUNIT: May form heterodimers with ABCG5 or be tightly coupled to 
CC ABCG5 along a pathway regulating diatery-sterol absorption and 

CC excretion (By similarity) . 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=3; 

CC Name=3 ; 

CC IsoId=P58428-3; Sequence=Displayed; 

CC Name=l; 



CC IsoId=P58428-l; Sequence=VSP__008767; 

CC Name=2 ; 

CC IsoId=P58428-2; Sequence=VSP_008767 , VSP_000054; 

CC Note=No experimental confirmation available; 

CC -!- TISSUE SPECIFICITY: Highest expression in liver , with lower levels 
CC in small intestine and colon. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC -!- CAUTION: Seems to have a defective ATP-binding region. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF351785; AAK84831.2; 

DR EMBL; AY145899; AAN64276.1; 

DR EMBL; AF404109; AAK85393.1; 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Glycoprotein; Transmembrane; Transport; Alternative splicing. 
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FT 


VARSPLIC 


56 


77 


Missing (in isoform 1 and isoform 2) 


FT 








/FTId=VSP_008767. 


FT 


VARSPLIC 


398 


398 


Missing (in isoform 2). 


FT 








/FTId=VSP 000054. 


FT 


CONFLICT 


3 


4 


EK -> QT (IN REF. 3) . 


SQ 


SEQUENCE 


694 AA; 


78236 


MW; 67F67C195F417587 CRC64; 



Query Match 20.8%; Score 692.5; DB 1; Length 694; 

Best Local Similarity 29.3%; Pred. No. 6.4e-42; 

Matches 189; Conservative 122; Mismatches 255; Indels 79; Gaps 17; 

Qy 34 EPH-SLGILHASYSVSHRVRPW WD I T S C RQQWT RQ I LKDVS L YVE S GQ IM 82 

: I I I I I I I : : : I I I I : I I : : : I I I I I : : 

Db 67 DPHMSLG-LSESVDMASQV- PWFEQLAQFKLPWRSRGSQDSWDLGI-RNLSFKVRSGQML 123 



Qy 83 CILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSL 142 

I : I I : I I : I I I I : : I I I : : : : I I : : I I : : I I I I I : I 

Db 124 AIIGSAGCGRATLLDVITGRDHGGKMKSGQIWINGQPSTPQLIQKCVAHVRQQDQLLPNL 183 

Qy 143 TVRETLHYTALLAI RRGNPGS F QKKVEAVMAELSLSHVADRLIGNYSLGGISTGE 197 

I M I I I : I : : I : I I : I I I : I I I I I : : I I : I : I I I 

Db 184 TVRETLTFIAQMRL PKTFSQAQRDKRVEDVIAELRLRQCANTRVGNTYVRGVSGGE 239 

Qy 198 RRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVEIlARRNRIVVLTIHQPRSELF 257 

I I I I I I I II : I : : : M I I : I I I I I : : I I I I : I I : I : : : : I | | | | : : | 
Db 240 RRRVSIGVQLLWNPGILILDEPTSGLDSFTAHNLVRTLSRLAKGNRLVLISLHQPRSDIF 299 

Qy 258 QLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIE 317 

: I I I : : : : I hi I : : I I I I I I : I I I I I I : I I I I : I : I I I : I : 

Db 300 RLFDLVLLMTSGTPIYLGVAQHMVQYFTSIGYPCPRYSNPADFYVDLTSIDRRSKEQEVA 359 

Qy 318 TSKRVQMIESAYKKSA ICHKTLKNIERMKHLKTLPMVPFKTKDS PG 363 

I ::::::: : : |::: | : |:|: || 

Db 360 TMEKARLLAALFLEKVQGFDDFLWKAEAKSLD TGTYAVSQTLTQDTNCGTAAELPG 415 

Qy 364 VFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRVGLL 423 

: : I : I I I I : : : : | | : | I : I II 

Db 416 MIQQFTTLIRRQISNDFRDLPTLFIHGAEACLMSLIIGFLYYGHADKPL — SFMDMAALL 473 

Qy 424 YQ FVGAT P YT GMLNAVNL FPVLRAVS DQ ES QDGLYQKWQMMLAYALHVL P FS WATMI FS 483 

: I : : | : | : I : : I : I I I I I I I I : I : 

Db 474 FMIGALIPFNVILDWSKCHSERSLLYYELEDGLYTAGPYFFAKVLGELPEHCAYVIIYG 533 

Qy 484 SVCYWTLGLHPEVARFGYFSAALLAPHLIGEFL TLVLLGIVQNPNI-VNSWAL 536 

I I I I I | : : | I : I I :: I 

Db 534 MPIYWLTNLRP GPELFLLHFMLLWLWFCCRTMALAASAMLPTFHMSSFCCN 585 

Qy 537 LSIAGVLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSNVSVTT 596 

: : I I : I : : I II : I : : I I : : I I : I I : : : 

Db 586 ALYNSFYLTAGFMINLNNLWIVPAWISKMSFLRWCFSGLMQIQFNGHIYTTQIGNLTFSV 645 

Qy 597 NPMCAFTQGIQFI EKTCPG — ATSRFTMNFLILYS FI PALVI LGI 639 

II : : I II: I : : : I I 
Db 646 PGDAMVTAMDLNSHPLYAIY— LIVIGI 671 

RESULT 7 
ABG2 HUMAN 



ID ABG2_HUMAN STANDARD; PRT; 655 AA. 

AC Q9UNQ0; 095374; Q9BY73; Q9NUS0; 

DT 16-OCT-2001 (Rel. 40, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 28-FEB-2003 (Rel. 41, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 2 (Placenta-specific ATP- 

DE binding cassette transporter) (Breast cancer resistance protein) . 

GN ABCG2 OR ABCP OR BCRP OR BCRP1. 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 



RP SEQUENCE FROM N.A. 

RC TISSUE=Placenta; 

RX MEDLINE=99065313; PubMed-9850061 ; 

RA Allikmets R. , Schriml L.M., Hutchinson A., Romano -Spica V. , Dean M. ; 

RT "A human placenta-specific ATP-binding cassette gene (ABCP) on 

RT chromosome 4q22 that is involved in multidrug resistance."; 

RL Cancer Res. 58:5337-5339(1998). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Breast cancer; 

RX MEDLINE=99080071; PubMed=98 61027 ; 

RA Doyle L.A. , Yang W., Abruzzo L.V., Krogmann T . , Gao Y. , Rishi A.K., 

RA Ross D. D. ; 

RT "A multidrug resistance transporter from human MCF-7 breast cancer 

RT cells."; 

RL Proc. Natl. Acad. Sci. U.S.A. 95:15665-15670(1998). 

RN [3] 

RP ERRATUM. 

RA Doyle L.A. , Yang W., Abruzzo L.V., Krogmann T., Gao Y. , Rishi A.K., 

RA Ross D. D. ; 

RL Proc. Natl. Acad. Sci. U.S.A. 96:2569-2569(1999). 

RN [4] 

RP SEQUENCE FROM N.A. 

RA Kage K. , Tsukahara S., Sugiyama T., Asada S., Ishikawa E . , Tsuruo T., 

RA Sugimoto Y. ; 

RT "Breast cancer resistance protein constitutes a 140-kDa complex as a 

RT homodimer. "; 

RL Submitted (MAR-2001) to the EMBL/ GenBank/DDBJ databases. 

RN [5] 

RP SEQUENCE OF 198-655 FROM N.A. 

RC TISSUE=Placenta; 

RA Isogai T., Ota T. f Hayashi K. , Sugiyama T. , Otsuki T., Suzuki Y. , 

RA Nishikawa T., Nagai K. , Sugano S., Shiratori A. , Sudo H., 

RA Wagatsuma M. , Hosoiri T., Kaku Y., Kodaira H., Kondo H., Sugawara M. f 

RA Takahashi M. , Chiba Y., Ishida S., Murakawa K. , Ono Y. , Takiguchi S., 

RA Watanabe S., Kimura K., Murakami K., Ishii S., Kawai Y., Saito K. r 

RA Yamamoto J., Wakamatsu A. , Nakamura Y., Nagahari K. , Masuho Y. f 

RA Ninomiya K w Iwayanagi T. ; 

RT "NEDO human cDNA sequencing project."; 

RL Submitted (FEB-2000) to the EMBL/ GenBank/DDBJ databases. 

RN [6] 

RP REVIEW. 

RX MEDLINE-21474438; PubMed=11590207 ; 

RA Schmitz G., Langmann T., Heimerl S.; 

RT "Role of ABCG1 and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2001). 

CC -!- FUNCTION: Xenobiotic transporter that appears to play a major role 

CC in the multidrug resistance phenotype of a specific MCF-7 breast 

CC cancer cell line. When overexpressed, the transfected cells become 

CC resistant to mitoxantrone, daunorubicin and doxorubicin, display 

CC diminished intracellular accumulation of daunorubicin, and 

CC manifest an ATP-dependent increase in the efflux of rhodamine 123. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 

CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 



CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; AF103796; AAD09188.1; -. 

DR EMBL; AF098951; AAC97367.1; -. 

DR EMBL; AB056867; BAB39212.1; -. 

DR EMBL; AK002040; BAA92050.1; -. 

DR Genew; HGNC:74; ABCG2 . 

DR MIM; 603756; 

DR GO; GO: 0016021; C: integral to membrane; TAS . 

DR GO; GO:0005524; F:ATP binding; TAS. 

DR GO; GO: 0004009; F: ATP-binding cassette (ABC) transporter acti . . .; TAS. 

DR GO; GO: 0005215; F: transporter activity; TAS. 

DR GO; GO: 0008559; F: xenobiotic-transporting ATPase activity; TAS. 

DR GO; GO: 0009315; P:drug resistance; TAS. 

DR GO; GO: 0006810; P: transport; TAS. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR PROSITE; PS00211; AB C_T RAN S PORT ER_1 ; FALSE_NEG. 

DR PROSITE; PS508 93; AB C_T RAN S PORT ER_2 ; 1. 

KW ATP-binding; Transmembrane; Transport. 



FT 


DOMAIN 


1 


395 


CYTOPLASMIC ( POTENTIAL) . 


FT 


TRANSMEM 


396 


416 


POTENTIAL. 


FT 


DOMAIN 


417 


428 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


429 


449 


POTENTIAL. 


FT 


DOMAIN 


450 


477 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


478 


498 


POTENTIAL. 


FT 


DOMAIN 


499 


506 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


507 


527 


POTENTIAL . 


FT 


DOMAIN 


528 


535 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


536 


556 


POTENTIAL . 


FT 


DOMAIN 


557 


630 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


631 


651 


POTENTIAL. 


FT 


DOMAIN 


652 


655 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP_BIND 


80 


87 


ATP ( POTENTIAL) . 


FT 


CARBOHYD 


418 


418 


N-LINKED (GLCNAC. . . ) (POTENTIAL) 


FT 


CARBOHYD 


557 


557 


N-LINKED (GLCNAC. . .) (POTENTIAL) 


FT 


CARBOHYD 


596 


596 


N-LINKED (GLCNAC. . . ) (POTENTIAL) 


FT 


CONFLICT 


24 


24 


V -> A (IN REF. 2 AND 4) . 


FT 


CONFLICT 


166 


166 


E -> Q (IN REF. 2 AND 4) . 


FT 


CONFLICT 


208 


208 


F -> S (IN REF. 1) . 


FT 


CONFLICT 


315 


316 


MISSING (IN REF. 5) . 


FT 


CONFLICT 


482 


482 


R -> T (IN REF. 2) . 


SQ 


SEQUENCE 


655 AA; 


72343 


MW; 89A6D3511DC5CCE0 CRC64; 



Query Match 20.3%; Score 676.5; DB 1; Length 655; 

Best Local Similarity 29.0%; Pred. No. 8.2e-41; 

Matches 181; Conservative 137; Mismatches 251; Indels 55; Gaps 18; 



Qy 



21 S Q S S L E GAP AT AP EPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVE 77 



II: I I I I : : | :::::: I I : I I : : : I I : : : : : 

Db 13 SQGNTNGFPATVSNDLKAFTEGAVLSFHNICYRVKLKSGFLPCRKPVEKEILSNINGIMK 72 

Qy 78 SGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDT 137 

I : I I I : I I I : : I II : : I : I I : I : I I I I I : I I : I I 

Db 73 PG-LNAILGPTGGGKSSLLDVLA7^RKDPSG-LSGDVLING-APRPANFKCNSGYWQDDV 129 

Qy 138 LLSSLTVRETLHYTALLAI RRGNPG-SFQKKVEAVMAELSLSHVADRLIGNYSLGGISTG 196 

: : : I II I I I : : I I : : : : hill III : I : I : I I 

Db 130 VMGTLTVRENLQFSAALRLATTMTNHEKNERINRVIEELGLDKVADSKVGTQFIRGVSGG 18 9 

Qy 197 ERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIVVLLVELARRNRIVVLTIHQPRSEL 256 

11:111 : I : I I : : I I I I I I I I I I I : : : I I : : : : I : : : I I I I I : 
Db 190 ERKRTSIGMELITDPSILFLDEPTTGLDSSTANAVLLLLKRMSKQGRTIIFSIHQPRYSI 249 

Qy 257 FQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQ SK 312 

I : I I I : : I : I I : I I I I : I III : : I I I I : : I : : I : : : 

Db 250 FKLFDSLTLLASGRLMFHGPAQEALGYFESAGYHCEAYNNPADFFLDIINGDSTAVALNR 309 

Qy 313 ERE -IETSKR VQMIESAYKKSAICHKT LKNIERMKHLKTLPMVPF 356 

I : II II: : : : I I : : I I I : I : : : 

Db 310 EEDFKATEIIEPSKQDKPLIEKLAEIYVNSSFYKETKAELHQLSGGEKKKKITVFKEISY 369 

Qy 357 KTKDSPGVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL — LFFVLRVRSNVLKG 414 

I : I : : I : I I : I I I : : : : : : I I : : : I I : I 
Db 370 TT SFCHQLRWVSKRSFKNLLGNPQASIAQIIVTWLGLVIGAIYFGLKNDST 421 

Qy 415 AIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYAL-HVLP 473 

I I : I I : I : I : :: I I I I I : : I I I : I I : I I 

Db 422 GIQNRAGVLF-FLTTNQCFSSVSAVELFWEKKLFIHEYISGYYRVSSYFLGKLLSDLLP 480 

Qy 474 FSWATMI FS SVCYWTLGLHPEVARFGYFSAALIAPHLI GEFLTLVLLGI VQNPNI VNSV 533 

: : : : I I : : I : I I I I : I I : : : : : I I : : I : 

Db 481 MRMLPS 1 1 FTCI VYFMLGLKPKADAFFVMMFTLM MVAYSAS SMALAI AAGQ'S WS VA 537 

Qy 534 VALLSIAGV— LVGSGFLRNIQEMPIPFKIISYFTFQKYCSEILWNEFYGLNFTCGSSN 591 

I : : I I : : I I I I : : : I I : : I I I I I I II I 
Db 538 TLLMTICFVFMMIFSGLLVNLTTIASWLSWLQYFSIPRYGFTALQHNEFLGQNFCPG 594 

Qy 592 VSVTTNPMCAFTQGIQFI EKTCPG 615 

: : I I I : III 
Db 595 LNATGNNPCNYA TCTG 610 



RESULT 8 
YOH5_YEAST 

ID YOH5_YEAST STANDARD; PRT; 1294 AA. 

AC Q08234; Q08233; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 16-OCT-2001 (Rel. 40, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE Probable ATP-dependent transporter YOL074C/YOL075C . 

GN YOL074C/YOL075C. 

OS Saccharomyces cerevisiae (Baker's yeast). 

OC Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes ; 
OC Saccharomycetales ; Saccharomycetaceae; Saccharomyces. 
OX NCBI TaxID=4932; 



RN [1] 

RP SEQUENCE FROM N . A. 

RX MEDLINE=97321807; PubMed=9178509; 

RA Tzermia M. , Katsoulou C, Alexandraki D.; 

RT "Sequence analysis of a 33.2 kb segment from the left arm of yeast 

RT chromosome XV reveals eight known genes and ten new open reading 

RT frames including homologues of ABC transporters, inositol 

RT phosphatases and human expressed sequence tags."; 

RL Yeast 13:583-589(1997). 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; Z74817; CAA99085.1; -. 

DR EMBL; Z74816; CAA99084.1; -. 

DR PIR; S77690; S77690. 

DR GermOnline; 143497; -. 

DR SGD; S0005435; YOL075C. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 2. 

DR ProDom; PD000006; ABC_transporter ; 2. 

DR SMART; SM00382; AAA; 2. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 2. 

DR PROSITE; PS50893; ABC_TRANSPORTER_2 ; 2. 

KW Hypothetical protein; ATP-binding; Transmembrane; Glycoprotein; 



KW 


Transport; 


Repeat 








FT 


TRANSMEM 


376 


396 


POTENTIAL . 




FT 


TRANSMEM 


496 


516 


POTENTIAL. 




FT 


TRANSMEM 


531 


551 


POTENTIAL. 




FT 


TRANSMEM 


605 


625 


POTENTIAL. 




FT 


TRANSMEM 


1039 


1059 


POTENTIAL. 




FT 


TRANSMEM 


1121 


1141 


POTENTIAL. 




FT 


TRANSMEM 


1267 


1287 


POTENTIAL. 




FT 


NP BIND 


62 


69 


ATP (POTENTIAL) . 




FT 


NP_BIND 


727 


734 


ATP (POTENTIAL) . 




FT 


CARBOHYD 


41 


41 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


86 


86 


N-LINKED (GLCNAC. 


, .) (POTENTIAL). 


FT 


CARBOHYD 


101 


101 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


151 


151 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


341 


341 


N-LINKED (GLCNAC. 


. .) (POTENTIAL). 


FT 


CARBOHYD 


349 


349 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


371 


371 


N-LINKED (GLCNAC. 


. .) (POTENTIAL). 


FT 


CARBOHYD 


528 


528 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


983 


983 


N-LINKED (GLCNAC. 


. . ) (POTENTIAL) . 


FT 


CARBOHYD 


1062 


1062 


N-LINKED (GLCNAC. 


. .) (POTENTIAL) . 


SQ 


SEQUENCE 


1294 AA; 145157 


MW; C555500A45E92* 


34E CRC64; 



Query Match 18.9%; Score 627; DB 1; Length 1294; 

Best Local Similarity 31.7%; Pred. No. 6.9e-37; 



Matches 181; Conservative 106; Mismatches 228; Indels 56; Gaps 19; 



Qy 


65 


Db 


706 


Qy 


120 


Db 


765 


Qy 


179 


Db 


825 


Ov 


239 


Db 


885 


Qv 


297 


Db 


945 


QV 


357 


Db 


995 


Qv 


402 


Db 


1054 


Qv 


462 


Db 


1111 


Qy 


519 


Db 


1168 


Qy 


578 


Db 


1224 



I : : I I : I : : | | I : I I I I I I : : I I : : | | || 



I I I I I I I : : I I I : I I I I I I : : : : : : | | 



Ml : II I I I : I I I : : I I I I I : : I I I I I : I I I I : I : : I : I 



: I II I I I I I I I : I : : I : I I I : I I I : : I : II I I 



I I : : I I I I : I I : : : I I : I I : I I I : I I : : : | | : 



[LV RNKLAVTTRLLQNLIMGLFLL 401 

II | : : : : | : | : | : 



I : I : I : I : I I : I : I I I : : I | : | | : | 

7VPVKHNYT — SI SNRLGLAQEST-ALYFVGMLGNLACYPTERDYFYEEYNDNVYGIA 1110 

ILAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSA7VLLAPHLI GEFLTL 518 

Ml III :|::::: II Mill: :: III: 

'LAYMT L E L P L S ALAS VL YAVFT VLAC GL - P RT A — GNFFATVYCSFIVTCCGERLGI 1167 



: : I : I I : : I I I : I I I | : | III:: 

MTNTFFERPGFWNCISIILSIGTQMSGLMSL GMSRVLKGFNYLNPVGYTSMIIIN 1223 

NEFYG-LNFTC — GS SNVS VTTNPMCAFTQG 605 

I I I I I I I I I I | 

FAFPGNLKLTCEDGGKNSDGT CEFANG 1250 



RESULT 9 
WHIT_LUCCU 

ID WHIT_LUCCU STANDARD; PRT; 677 AA. 

AC Q05360; 

DT 01-FEB-1995 (Rel. 31, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Lucilia cuprina (Greenbottle fly) (Australian sheep blowfly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; Oestroidea; 

OC Calliphoridae; Lucilia. 

OX NCBI TaxID-7375; 



RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=97087158; PubMed=8933176; 

RA Garcia R.L., Perkins H.D., Howells A.J.; 

RT "The structure, sequence and developmental pattern of expression of 

RT the white gene in the blowfly Lucilia cuprina."; 

RL Insect Mol. Biol. 5:251-260(1996). 

RN [2] 

RP SEQUENCE OF 490-584 FROM N.A. 

RX MEDLINE=90264941; PubMed=1971656; 

RA Elizur A., Vacek A.T., Howells A. J.; 

RT "Cloning and characterization of the white and topaz eye color genes 

RT from the sheep blowfly Lucilia cuprina."; 

RL J. Mol. Evol. 30:347-358(1990). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U38899; AAA82057.1; 

DR EMBL; X53265; CAA37365.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC^transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; AB C_T RAN S PORT ER__1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 119 126 ATP (POTENTIAL) . 

FT TRANSMEM 431 451 POTENTIAL. 

FT TRANSMEM 456 47 6 POTENTIAL. 

FT TRANSMEM 506 526 POTENTIAL. 

FT TRANSMEM 534 554 POTENTIAL. 

FT TRANSMEM 563 583 POTENTIAL. 

FT TRANSMEM 647 667 POTENTIAL. 

SQ SEQUENCE 677 AA; 75365 MW; D16FC11C97EED51D CRC64; 

Query Match 18.7%; Score 623; DB 1; Length 677; 

Best Local Similarity 27.0%; Pred. No. 5.8e-37; 

Matches 188; Conservative 144; Mismatches 260; Indels 104; Gaps 19; 

Qy 8 T PGGSMGLQVNRGSQS S LEGAPATAPEPH S LGI LHAS YS VSHRVRPWW DITSC 60 

III : I I : : I I I : : I I : : I : : 

Db 27 TPG TLEASAINSGFSKSYGSLVSNESASEKLTYSWCNLDVFGEVHQP 73 



Qy 61 RQQW TRQI LKDVS LYVES GQIMCI LGS S GS GKTTLLDAMS 100 

I : : : I : I I : : : : : I I I I : I I I I I I : I : :■ 

Db 74 G S NWKQ LVN RVKGVFCN E RH I P K P RKH L I KNVC GVAY P GE L LAVMG S S GAGKT T L LNALA 133 

Qy 101 GRLGRAGT FLGEVYV NGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIR 157 

I II : I II : : : I : I I I I : M I I I I : I : : 

Db 134 FRS AR- GVQI S P S S VRMLNGH PVDAKEMQARCAYVQQDDLFI GS LTAREHLI FQATVRMP 192 

Qy 158 RGNPGSFQ-KKVEAVMAELSLSHVADRLIG-NYSLGGISTGERRRVSIAAQLLQDPKVML 215' 

I : : : | : | : : | | | : : | | : | : | | | | : | : : | : : | | | : : : 

Db 193 RTMTQKQKLQRVDQVIQDLSLIKCQNTIIGVPGRVKGLSGGERKRLAFASEALTDPPLLI 252 

Qy 216 FDEPTTGLDCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCG 275 

IIIIMII I : I : I : I : : I : I : I I I I I I I I I I : I I I I I : : : I : I I 

Db 253 CDEPTSGLDSFMAASWQVLKKLSQRGKTVI LTIHQPSSELFELFDKILLMAEGRVAFLG 312 

Qy 276 TPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAIC 335 

I I I : I I I : I I I : I I I I I : : : I MM: I : I : : 

Db 313 TPVEAVDFFS FI GAQCPTNYNPADFYVQVLAV VPGREIESRDRISKICDNFAVGKVS 369 

Qy 336 HKTLKNIERMKHLKTLPMVPFKT KDSPGV FSKLGVLLRRVTRNLVRNKL 38 4 

: : I : : : I I if I : : : : I : : : I 

Db 370 REMEQNFQK I AAKTDGLQKDDETTI LYKASWFTQFRAIMWRSWI STLKEPL 420 

Qy 385 AVITRLLQNLIMGLFL-LFFVLRVRSNVLKGAIQDRVGLLYQFVGATPYTGMLNAVNLFP 443 

I I I : I : : : : I | : : : I : : I : : I : : : : I : I 

Db 421 LVKVRL I QTTMVAVL I GL I FLNQPMTQV GVMNINGAI FLFLTNMTFQNVFAVINVFT 477 

Qy 444 VLRAVSDQESQDGLYQKWQMMLAYALHVLPFSVVATMI FSSVCYWTLGLHPEVARFGYFS 503 

I : I : : II: I I II : I : I : : I : I I I : : I 
Db 478 SELPVFMRETRSRLYRCDTYFLGKTLAELPLFLWPFLFIAIAYPMIGLRPGIT HFL 534 

Qy 504 AALLAPHLIGEFLT LVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFK 560 

: I I I : I : : : : : I I I : I : I I I I I : I : I I 

Db 535 SAL7VLVTLVANVSTSFGYLISCASTSTSMALSVGPPLTIPFLLFGGVFL-NSGSVPVYFK 593 

Qy 561 IISYFTFQKYCSEILWNEFYGL NFTCGSSNVSVTTNPMCAFTQGIQFIEKTCP — G 615 

: I II : : : I : I I : : I : : : I I I : I III I 

Db 594 WLSYFSWFRYANEGLLINQWADVQPGEITCTSTNT TCPSSG 634 

Qy 616 AT S RFTMNFLI LYS FI PAL VI LGI WFKI RDHLI S R 651 

| : | | : : | : | | | : : | : | : : : 

Db 635 XVXLETLNFRDKFTFRLYGLILLILIFRIAGYVAXK 670 



RESULT 10 
ADP1_YEAST 

ID ADP1_YEAST STANDARD; PRT; 1049 AA. 

AC P25371; 

DT 01-MAY-1992 (Rel. 22, Created) 

DT 01-MAY-1992 (Rel. 22, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE Probable ATP-dependent permease precursor. 

GN ADP1 OR YCR011C OR YCR11C OR YCR105. 

OS Saccharomyces cerevisiae (Baker's yeast). 

OC Eukaryota; Fungi; Ascomycota; Saccharomycotina; Saccharomycetes ; 
OC Saccharomycetales ; Saccharomycetaceae; Saccharomyces. 



OX NCBI_TaxID=4 932; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=92160395; PubMed-1789009 ; 

RA Purnelle B., Skala J. , Goffeau A.; 

RT "The product of the YCR105 gene located on the chromosome III from 

RT Saccharomyces cerevisiae presents homologies to ATP-dependent 

RT permeases . " ; 

RL Yeast 7:867-872(1991). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=92327849; PubMed=1626432 ; 

RA Skala J. , Purnelle B. , Goffeau A.; 

RT "The complete sequence of a 10.8 kb segment distal of SUF2 on the 

RT right arm of chromosome III from Saccharomyces cerevisiae reveals 

RT seven open reading frames including the RVS161, ADP1 and PGK genes."; 

RL Yeast 8:409-417(1992). 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein (Potential). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; X59720; CAA42328.1; 

DR PIR; S19421; S19421. 

DR GermOnline; 138916; -. 

DR SGD; S0000604; ADP1. 

DR GO; GO: 0005783; C : endoplasmic reticulum; IDA. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 



DR 


PROSITE; 


PS00211; 


ABC TRANSPORTER 1; 1. 




DR 


PROSITE; 


PS50893; 


ABC TRANSPORTER 2; 1. 




KW 


ATP-binding; Transmembrane; 


Glycoprotein; Transport; 


Signal . 


FT 


SIGNAL 


1 


25 


POTENTIAL. 




FT 


CHAIN 


26 


1049 


PROBABLE ATP-DEPENDENT 


PERMEASE. 


FT 


NP_BIND 


423 


430 


ATP (BY SIMILARITY) . 




FT 


TRANSMEM 


325 


345 


POTENTIAL. 




FT 


TRANSMEM 


464 


481 


POTENTIAL. 




FT 


TRANSMEM 


794 


814 


POTENTIAL. 




FT 


TRANSMEM 


829 


849 


POTENTIAL. 




FT 


TRANSMEM 


878 


898 


POTENTIAL. 




FT 


TRANSMEM 


910 


930 


POTENTIAL. 




FT 


TRANSMEM 


938 


958 


POTENTIAL. 




FT 


TRANSMEM 


1001 


1021 


POTENTIAL. 




FT 


TRANSMEM 


1025 


1045 


POTENTIAL. 




FT 


CARBOHYD 


50 


50 


N-LINKED (GLCNAC. . .) 


(POTENTIAL) 


FT 


CARBOHYD 


114 


114 


N-LINKED (GLCNAC. . .) 


( POTENTIAL) 


FT 


CARBOHYD 


165 


165 


N-LINKED (GLCNAC. . .) 


(POTENTIAL) 


FT 


CARBOHYD 


221 


221 


N-LINKED (GLCNAC. . .) 


(POTENTIAL) 



FT CARBOHYD 815 815 N-LINKED (GLCNAC. . . ) (POTENTIAL). 

FT CARBOHYD 935 935 N-LINKED ( GLCNAC . . .) (POTENTIAL). 

FT CARBOHYD 960 960 N-LINKED (GLCNAC. . .) (POTENTIAL). 

FT CARBOHYD 971 971 N-LINKED (GLCNAC. . .) (POTENTIAL). 

SQ SEQUENCE 1049 AA; 117231 MW; ABC9CE54BCFDF6A3 CRC64; 

Query Match 18.7%; Score 621; DB 1; Length 1049; 

Best Local Similarity 28.6%; Pred. No. 1.4e-36; 

Matches 196; Conservative 111; Mismatches 223; Indels 156; Gaps 22; 

Qy 68 ILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQD 127 

:| ::| I : I I I : I : I I I : I I I I I I I : : : : I I : III :: |: I 
Db 405 VXNEISGIVKPGQILAIMGGSGAGKTTLLDIL7\MK-RKTGHVSGSIKVNGISMDRKSFSK 463 

Qy 128 CFS YVLQSDTLLS S LTVRETLHYTALLAI RRGNPGS FQKK VEAVMAELSLSHVADRL 184 

: I I I I I : I I I II : : I I I : : I I : I I I : I I : : I I : 

Db 464 1 1 GFVDQDDFLLPTLTVFETVLNSALLRLPKAL — S FEAKKARVYKVLEELRI I DI KDRI 521 

Qy 185 IGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQI VVLLV^LAR-RNR 243 

III I I I I I : I I I I I I : I : I I : I I I I : I I I I I : : III: II 

Db 522 IGNEFDRGISGGEKRRVSIACELVTSPLVLFLDEPTSGLDASNANNVIECLVRLSSDYNR 581 

Qy 244 IWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMD 303 

: I I : I I I I I I : I I I I I : : I I I I : : : I : : : I : I I I I : : I I : : I 
Db 582 TLVLSIHQPRSNIFYLFDKLVLLSKGEMVYSGNAKKVSEFLRNEGYICPDNYNIADYLID 641 

Qy 304 LT-SVDTQSKEREI 316 

:| I I I I 

Db 642 ITFEAGPQGKRRRIRNISDLEAGTDTNDIDNTIHQTTFTSSDGTTQREWAHLAAHRDEIR 701 

Qy 317 ETSKRVQMIESAYKKSAICHKTLKNIERM 345 

I : : : III : : I I : 

Db 702 SLLRDEEDVEGTDGRRGATEIDLNTKLLHDKYKDSVYYAELSQEIEEVLSEGDEESNVLN 761 

Qy 34 6 KHLKTLPMVPFKTKDSPGVFSKLGVLLRRWRNLVRNKIAVITRLLQNLIMGLFLLFFVL 4 05 

II : I I : I : I I : I : I I : : I : : : I I I 

Db 762 GDLPT GQQSAGFLQQLSILNSRSFKNMYRNPKLLLGNYLLTILLSLFLGTLYY 814 

Qy 406 RVRSNVLKGAIQDRVGLLY QFVGATPYTGMLNAVNLFPVLRAVSDQESQDGLYQKWQ 462 

I I I : I I : I : I I : : I : I I : : | : | : : | : | 

Db 815 NV- SNDI SG- FQNRMGLFFFI LT YFGFVTFTGL S S FALERI I FI KERSNN YYS P — 866 

Qy 463 MMLAYAL HVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLT 517 

Ml: | : | | | : : | : | | | : : | : : | : | | 

Db 867 — LAYYISKIMSEWPLRWPPILLSLIVYPMTGLNMKDNAF-FKCIGILILFNLGISLE 923 

Qy 518 LVLLGIV QNPNIVNSWALLSIAGVLVGSGFLRNIQEMP-IPFKIISYFTFQKYCSE 573 

: : : I I : I : I : I I : I I I I : I I I : : : I I : I : I I 
Db 924 ILTIGIIFEDLNNSIILSVLVLL GSLLFSGLFINTKNITNVAFKYLKNFSVFYYAYE 980 

Qy 574 ILWNEF YGLN FTC GS SNVS VTTN PMCAFTQGI Q FI EKTC P GAT S RFTMN F 624 

l-M I M I llllll 
Db 981 SLLINEVKTLMLKERKYGLNI EVPGATILSTFGF 1014 

Qy 625 LILYS FIPALVILGI— WFKIRDHL 648 

: : : : : I I : I I I I : I 
Db 1015 - WQNLVFDI KI LALFNWFLIMGYL 1039 



RESULT 11 
WHIT_ANOGA 

ID WHIT_ANOGA STANDARD; PRT; 695 AA. 

AC Q27256; Q17006; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 

DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Anopheles gambiae (African malaria mosquito) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Diptera; Nematocera; Culicoidea; Anopheles. 

OX NCBI_TaxID=7165; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Suakoko / G3; 

RX MEDLINE=96423158; PubMed=8 825759 ; 

RA Besansky N.J., Bedell J. A., Benedict M.Q., Mukabayire O. , Hilfiker D., 

RA Collins F.H. ; 

RT "Cloning and characterization of the white gene from Anopheles 

RT gambiae . " ; 

RL Insect Mol. Biol. 4:217-231(1995). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 

DR EMBL; U29486; AAC46995.1; -. 

DR EMBL; U29485; AAC46994.1; -. 

DR EMBL; U29484; AAC47423.1; 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR008965; Cellul_bind. 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; ABC_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 
FT 
FT 
FT 
FT 
FT 



NP_BIND 


133 


140 


ATP (POTENTIAL) 


NP_BIND 


288 


295 


ATP (POTENTIAL) 


TRANSMEM 


444 


464 


POTENTIAL. 


TRANSMEM 


474 


494 


POTENTIAL . 


TRANSMEM 


524 


544 


POTENTIAL. 



FT 


TRANSMEM 


552 


572 


POTENTIAL. 




FT 


TRANSMEM 


581 


601 


POTENTIAL. 




FT 


TRANSMEM 


669 


689 


POTENTIAL. 




FT 


CARBOHYD 


472 


472 


N-LINKED (GLCNAC. . 


. ) (POTENTIAL) 


FT 


CARBOHYD 


645 


645 


N-LINKED (GLCNAC. . 


.) (POTENTIAL) 


FT 


CONFLICT 


inn 


1UU 


N -> S (IN REF. 1; 


AAC47423 ) . 


FT 


CONFLICT 


691 


693 


SRS -> YAR (IN REF. 


1; AAC47423) . 


SQ 


SEQUENCE 


695 AA; 


77218 MW; 


EE8B9517239B2961 


CRC64; 


Query Match 




18.3%; 


Score 607.5; DB 1; 


Length 695; 



Best Local Similarity 28.4%; Pred. No. 7.7e-36; 

Matches 170; Conservative 124; Mismatches 208; Indels 97; Gaps 17; 

Qy 58 TSCRQQWTRQ 1 LKDVS LYVES GQIMCI LGS S GS GKTTLLDAMS GRLGRAGT 108 

I I III : I I : I : : I I : : : : : I I I I : I I I I I I : I : : I I 

Db 96 TRLRNCCTRQRKDFNPRKHLLKNVTGVAKSGELLAVMGSSGAGKTTLLNALAFR-SPPGV 154 

Qy 109 FLGEVYV NGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGSF- 164 

: I II : II: : I I I I : I I I I I I : I : I : I II 

Db 155 KISPNAVRALNGVPVNAEQLRARCAYVQQDDLFIPSLTTREHLLFQAMLRMGRDVPASVK 214 

Qy 165 QKKVEAVMAELSLSHVADRLIGNYS-LGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGL 223 

I : I : I : I I I I II : I I : I : I I I I : I : : I : : I II : : I I I I I : I I 

Db 215 QHRVQEVLQELSLVKCADTIIGAPGRIKGLSGGERKRLAFASETLTDPHLLLCDEPTSGL 274 

Qy 224 DCMTANQIWLLVELARRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDF 283 

I I : : : : I : I : : : : I I I I II I I I : I I I I I : : : I : I I : I : : I 
Db 275 DS FMAHS VLQVLKGMAMKGKT 1 1 LT I HQP S S ELYCLFDKI LLVAEGRVAFLGS P YQS AEF 334 

Qy 284 FNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIE 343 

I : lll|:|||||:::: : : | | : : | : : | | : | : 

Db 335 FSQLGIPCPPNYNPADFYVQMLAI APAKEAECRDMIKKICDSFAVSPIAREVLETAS 391 

Qy 344 RMKHL KT L PMVP FKT KD S P GVF S KL - GV LLRRVTRNLVRNKLAVI 387 

I I I I : :: I I : I I :::::: | 

Db 392 VAGKGMDEP YMLQQVEGVG S TGYRS S WWTQ F YC I LWRS WL S VLKD PMLVK 441 

Qy 38 8 TRLLQNLIMGLFLLFFVLRVRSNVLKGAIQDRVGL L YQ FVGAT P YT GMLNAVN L 441 

MM:: : : : | : | : | : | : \: : : : | : 

Db 442 VRL LQT AMVAT L I GSIYFGQVLDQDGVMNINGSLFLFLTNMTFQNVFAVTNV 493 

Qy 442 FPVLRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGY 501 

I I : I : I I: I : II : =1 = 1= I =11 II 
Db 494 FSAELPVFLREKRSRLYRVDTYFLGKTIAELPLFIAVPFVFTSITYPMIGL RTG- 547 

Qy 502 FSAALLAPHLIGEFLTLVLLGIVQNPN IVNSWALLSIA GVLVGSG 547 

11= || : : : | | : : : : : : | : | | : | 

Db 548 ATHYL TTLFI VTLVANVSTS FGYLI SCASS S I SMALSVGPPWI PFLI FGG 598 

Qy 548 FLRNIQEMPI PFKI I SYFTFQKYCSEILWNEFYGL NFTCGSSNVSVTT 596 

I I = I II : II = = : I : I I : = I : : : Ml MM 

Db 599 FFLNSASVPAYFKYLSYLSWFRYANEALLINQWSTVVDGEIACTRANVTCPRSEIILET 657 



RESULT 12 
WHIT_DROME 

ID WHITJDROME STANDARD; PRT; 687 AA. 



AC P10090; Q9V3A2; Q9XY33; 

DT 01-MAR-1989 (Rel. 10 f Created) 

DT 01-NOV-1991 (Rel. 20, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE White protein. 

GN W OR EG:BACN33B1.1 OR CG2759. 

OS Drosophila melanogaster (Fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota; Diptera; Brachycera; Muscomorpha; 

OC Ephydroidea; Drosophilidae; Drosophila. 

OX NCBI_TaxID=7227; 

RN [1] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Head; 

RX MEDLINE=90221897; PubMed=2 109311 ; 

RA Pepling M., Mount S.M.; 

RT "Sequence of a cDNA from the Drosophila melanogaster white gene."; 

RL Nucleic Acids Res. 18:1633-1633(1990). 

RN [2] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=85134865; PubMed=60847 17 ; 

RA O'Hare K., Murphy C, Levis R. , Rubin G.M. ; 

RT "DNA sequence of the white locus of Drosophila melanogaster . " ; 

RL J. Mol. Biol. 180:437-455(1984). 

RN [3] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21100348; PubMed=11156992 ; 

RA Lukacsovich T., Asztalos Z., Awano W. , Baba K., Kondo S., Niwa S., 

RA Yamamoto D. ; 

RT "Dual-tagging gene trap of novel genes in Drosophila melanogaster."; 

RL Genetics 157:727-742(2001). 

RN [4] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Berkeley; 

RX MEDLINE=20196006; PubMed=1073 1132 ; 

RA Adams M.D., Celniker S.E., Holt R.A. , Evans C.A., Gocayne J.D., 

RA Amanatides P.G., Scherer S.E., Li P.W., Hoskins R.A. , Galle R.F., 

RA George R.A., Lewis S.E., Richards S., Ashburner M. , Henderson S.N., 

RA Sutton G.G., Wortman J.R., Yandell M.D., Zhang Q., Chen L.X., 

RA Brandon R.C., Rogers Y.-H.C, Blazej R.G., Champe M., Pfeiffer B.D., 

RA Wan K.H., Doyle C, Baxter E.G., Helt G. , Nelson C.R., Miklos G.L.G., 

.RA Abril J.F., Agbayani A. , An H.-J., Andrews-Pf annkoch C, Baldwin D., 

RA Ballew R.M., Basu A., Baxendale J., Bayraktaroglu L., Beasley E.M., 

RA Beeson K.Y., Benos P.V., Berman B.P., Bhandari D., Bolshakov S., 

RA Borkova D., Botchan M.R., Bouck J., Brokstein P., Brottier P., 

RA Burtis K.C., Busam D. A. , Butler H., Cadieu E. , Center A., Chandra I., 

RA Cherry J.M., Cawley S., Dahlke C, Davenport L.B. r Davies P., 

RA de Pablos B., Delcher A., Deng Z. f Mays A.D., Dew I., Dietz S.M., 

RA Dodson K., Doup L.E., Downes M. , Dugan-Rocha S., Dunkov B.C., Dunn P., 

RA Durbin K.J., Evangelista C.C., Ferraz C. f Ferriera S., Fleischmann W., 

RA Fosler C, Gabrielian A.E., Garg N.S., Gelbart W.M., Glasser K., 

RA Glodek A., Gong F., Gorrell J.H., Gu Z . , Guan P., Harris M. , 

RA Harris N.L., Harvey D.A. , Heiman T.J., Hernandez J.R., Houck J., 

RA Hostin D., Houston K.A., Howland T.J., Wei M.-H., Ibegwam C, 

RA Jalali M. , Kalush F, , Karpen G.H., Ke Z. r Kennison J. A., Ketchum K.A. , 

RA Kimmel B.E., Kodira CD., Kraft C, Kravitz S., Kulp D . , Lai Z., 

RA Lasko P., Lei Y., Levitsky A. A. , Li J.H., Li Z., Liang Y . , Lin X. f 



RA Liu X., Mattei B., Mcintosh T.C., McLeod M.P., McPherson D. , 

RA Merkulov G. , Milshina N.V., Mobarry C, Morris J., Moshrefi A., 

RA Mount S.M., Moy M. , Murphy B., Murphy L., Muzny D.M., Nelson D.L., 

RA Nelson D.R., Nelson K.A. , Nixon K., Nusskern D.R., Pacleb J.M., 

RA Palazzolo M., Pittman G.S., Pan S., Pollard J., Puri V., Reese M.G., 

RA Reinert K. , Remington K. , Saunders R.D.C., Scheeler F. , Shen H., 

RA Shue B.C., Siden-Kiamos I., Simpson M. , Skupski M.P., Smith T., 

RA Spier E. , Spradling A.C., Stapleton M. , Strong R. , Sun E., 

RA Svirskas R. , Tector C, Turner R. , Venter E. , Wang A.H., Wang X. f 

RA Wang Z.-Y., Wassarman D.A., Weinstock G.M., Weissenbach J. , 

RA Williams S.M., Woodage T . , Worley K.C., Wu D. f Yang S., Yao Q.A., 

RA Ye J., Yeh R.-F., Zaveri J.S., Zhan M., Zhang G., Zhao Q. , Zheng L., 

RA Zheng X.H., Zhong F.N. r Zhong W., Zhou X., Zhu S., Zhu X., Smith H.O., 

RA Gibbs R.A. , Myers E.W., Rubin G.M., Venter J.C.; 

RT "The genome sequence of Drosophila melanogaster . " ; 

RL Science 287:2185-2195(2000). 

RN [5] 

RP SEQUENCE FROM N.A. 

RC STRAIN=Oregon-R; 

RX MEDLINE=20196011; PubMed=10731137 ; 

RA Benos P.V. f Gatt M.K., Ashburner M. , Murphy L., Harris D., 

RA Barrell B.G., Ferraz C, Vidal S., Brun C, Demailles J w Cadieu E., 

RA Dreano S., Gloux S., Lelaure V., Mottier S. f Galibert F., Borkova D., 

RA Minana B., Kafatos F.C., Louis C, Siden-Kiamos I., Bolshakov S., 

RA Papagiannakis G. , Spanos L., Cox S., Madueno E., de Pablos B,, 

RA Modolell J., Peter A., Schoettler P., Werner M. , Mourkioti F. , 

RA Beinert N. r Dowe G., Schaefer U w Jaeckle H., Bucheton A., 

RA Callister D.M., Campbell L.A. , Darlamitsou A., Henderson N.S., 

RA McMillan P. J., Salles C, Tait E.A., Valenti P., Saunders R.D.C., 

RA Glover D.M. ; 

RT "From sequence to chromosome: the tip of the X chromosome of D. 

RT melanogaster."; 

RL Science 287:2220-2222(2000). 

RN [6] 

RP SEQUENCE OF 224-331 FROM N.A. 

RX MEDLINE=89339145; PubMed-25034 16 ; 

RA Tearle R.G., Belote J.M., McKeown M. , Baker B.S., Howells A.J.; 

RT "Cloning and characterization of the scarlet gene of Drosophila 

RT melanogaster."; 

RL Genetics 122:595-606(1989). 

CC -!- FUNCTION: Part of a membrane-spanning permease system necessary 
CC for the transport of pigment precursors into pigment cells 

CC responsible for eye color. White dimerize with brown for the 

CC transport of guanine and with scarlet for the transport of 

CC tryptophan. 

CC -!- SUBUNIT: Het erodimer of white with either brown or scarlet. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

CC 



DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
DR 
KW 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
FT 
SQ 



EMBL; X51749; CAA36038.1; -. 
EMBL; X02974; CAA26716.1; 
EMBL; AB028139; BAA78210.1; 
EMBL; AE003425; AAF45826.1; 
EMBL; AL133506; CAB65847.1; 
EMBL; X76202; CAA53795.1; 
PIR; S08635; FYFFW. 
FlyBase; FBgn0003996; w. 

GO; GO: 0004888; F: transmembrane receptor activity; NAS. 

GO; GO: 0006727; P : ommo chrome biosynthesis; IMP. 

InterPro; IPR003593; AAA_ATPase. 

InterPro; IPR003439; ABC_transporter . 

InterPro; IPR005284; Pigment_permease . 

Pfam; PF00005; ABC_tran; 1. 

ProDom; PD000006; ABC_transporter ; 1. 

SMART; SM00382; AAA; 1. 

TIGRFAMs; TIGR00955; 3a01204; 1. 

PROSITE; PS00211; ABC_TRANSPORTER_l ; 1. 

PROSITE; PS50893; ABC_TRANSPORTER_2 ; 1. 

Pigment; ATP-binding; Transmembrane; Transport. 

NP_BIND 130 137 ATP (BY SIMILARITY) . 

TRANSMEM 435 453 POTENTIAL . 

TRANSMEM 465 48 5 POTENTIAL. 

TRANSMEM 515 533 POTENTIAL. 

TRANSMEM 542 563 POTENTIAL. 

TRANSMEM 576 594 POTENTIAL. 

TRANSMEM 659 67 8 POTENTIAL. 

CONFLICT 25 29 GDSGA -> LIFEI PYHCRVTAD (IN REF. 2 AND 

3) . 

CONFLICT 4 9 4 9 L -> R (IN REF. 4 AND 5) . 

CONFLICT 335 371 VGAQCPTNYNPADFYVQVLAWPGREIESRDRIAKIC -> 

ITLHLNSYPAWVPSVLPTTIRRTFTYRCWPLCPDGRSSPVI 
GSPRYG (IN REF. 3) . 

SEQUENCE 687 AA; 75672 MW; 24AFAD799DE0D396 CRC64; 



Query Match 18.1%; Score 602.5; DB 1; 

Best Local Similarity 28.8%; Pred. No. 1.7e-35; 
Matches 180; Conservative 131; Mismatches 220; 



Length 687; 
Indels 95; 



Gaps 



19 



Qy 



Db 



66 RQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGR— LGRAGTFLGEVYVNGRALRRE 12 3 
: : I I : I I : : : : : I I I I : I I I I I I : I : : I I : I : I I : : : 

110 KHLLKNVCGVAYPGELLAVMGSSGAGKTTLLNALAFRSPQGIQVSPSGMRLLNGQPVDAK 169 



Qy 

Db 

Qy 

Db 

Qy 

Db 

Qy 



124 QFQDCFSYVLQSDTLLSSLTVRETLHYTALLAI RRGNPGSFQK KVEAVMAELSLSHV 180 

: I : I I I I : I I I I I I : I : : I : : : : : I : I : I I I I I 

170 EMQARCAYVQQDDLFIGSLTAREHLIFQAM — VRMPRHLTYRQRVARVDQVIQELSLSKC 227 

181 ADRLIG-NYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELA 239 

: I I : | : | | | | : | : : | : : | | | : : : | | | | : | | | I I : : I : | : | : 
228 QHTIIGVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFTAHSWQVLKKLS 287 

240 RRNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFD 299 

: : : I : I I I I I I I I I I : I I I I I : : : I : I I I I : I : I I I : I I I : I I I 
288 QKGKTVI LTIHQPSSELFELFDKILLMAEGRVAFLGTPSEAVDFFSYVGAQCPTNYNPAD 347 

300 FYMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTK 359 
I I : : : I I I I I : I : I : I I : : : I : : I I I 



Db 



348 FYVQVLAV VPGREIESRDRIAKICDNFAIS KVARDMEQLLATKNLE KPL 396 



Qy 360 DSP GVFSKLGVLLRRVTRNLVRNKLAVITRLLQNLIMGLFL-LFFVLRVRSN 410 

: I I : :| I :::: II ||:| :: : : | |: : : 

Db 397 EQPENGYTYKATWFMQFRAVLWRSWLSVLKEPLLVKVRLIQTTMVAILIGLIFLGQQLTQ 456 

Qy 411 VLKGAIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALH 47 0 

I : : | :: |: : : :|:| I : I : : I I : I : 

Db 457 V GVMNINGAI FLFLTNMT FQNVFAT INVFT S ELPVFMREARS RLYRCDT YFLGKT I A 513 

Qy 471 VLPFSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIV 530 

II: : : I : : : I : I I | | I | | : : | | : 

Db 514 ELPLFLTVPLVFTAIAYPMIGLRAGVLHF FNCLALVTLV — ANVS 556 

Qy 531 N S WALL S I AG VLVGSGFLRNIQEMPIPFKIISYFTFQKYCSEI 574 

I hi I I : I I I : I : I : | | : : : | : | 

Db 557 TSFGYLISCASSSTSMALSVGPPVIIPFLLFGGFFLNSGSVPVYLKWLSYLSWFRYANEG 616 

Qy 575 LWNEFYGL NFTCGSSNVSVTTNPMCAFTQGIQFIEKTCP — GATSRFTMNFLILYS 629 

I : : I : : : : I I M I I I I I : I I : 

Db 617 LLINQWADVEPGEISCTSSNT TCPSSGKVILETLNFSA — A 655 

Qy 630 FIP ALVI LGI WFKI RDHLI S R 651 

: I I I I I I I : : : I I 

Db 656 DLPLDYVGLAI L- 1 VS FRVLAYLALR 68 0 



RESULT 13 
ABG1_HUMAN 

ID ABG INHUMAN STANDARD; PRT; 678 AA. 

AC P45844; Q9BXK6; Q9BXK7; Q9BXK8; Q9BXK9; Q9BXL0; Q9BXL1; Q9BXL2; 

AC Q9BXL3; Q9BXL4 ; 

DT 01-NOV-1995 (Rel. 32, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 15-MAR-2004 (Rel. 43, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 1 (White protein homolog) 

DE (ATP-binding cassette transporter 8). 

GN ABCG1 OR ABC8 OR WHT1 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 

OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 

OX NCBI_TaxID=9606; 

RN [1] 

RP SEQUENCE OF 3-678 FROM N.A. (ISOFORMS 1 AND 4). 

RC TISSUE-Retina; 

RX MEDLINE=96256850; PubMed=8 659545 ; 

RA Chen H.M., Rossier C. , Lalioti M.D., Lynn A., Chakravarti A., 

RA Perrin G., Antonarakis S.E.; 

RT "Cloning of the cDNA for a human homologue of the Drosophila white 

RT gene and mapping to chromosome 21q22.3."; 

RL Am. J. Hum. Genet. 59:66-75(1996). 

RN [2] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20289799; PubMed-10830953 ; 

RA Hattori M. , Fujiyama A., Taylor T.D., Watanabe H., Yada T-, 

RA Park H.-S., Toyoda A., Ishii K., Totoki Y., Choi D.-K., Groner Y., 

RA Soeda E., Ohki M. , Takagi T., Sakaki Y., Taudien S., Blechschmidt K. , 



RA Polley A. , Menzel U. r Delabar J., Kumpf K., Lehmann R., Patterson D., 

RA Reichwald K., Rump A., Schillhabel M. , Schudy A., Zimmermann W., 

RA Rosenthal A., Kudoh J.,, Shibuya K. , Kawasaki K., Asakawa S., 

RA Shintani A. , Sasaki T . , Nagamine K., Mitsuyama S., Antonarakis S.E., 

RA Minoshima S., Shimizu N. , Nordsiek G. , Hornischer K., Brandt P., 

RA Scharfe M. , Schoen 0. , Desario A., Reichelt J., Kauer G., Bloecker H., 

RA Ramser J., Beck A., Klages S., Hennig S., Riesselmann L., Dagand E., 

RA Wehrmeyer S., Borzym K., Gardiner K., Nizetic D., Francis F. , 

RA Lehrach H. , Reinhardt R. , Yaspo M.-L.; 

RT "The DNA sequence of human chromosome 21."; 

RL Nature 405:311-319(2000). 

RN [3] 

RP SEQUENCE FROM N.A. (ISOFORM 1) . 

RX MEDLINE=20408883; PubMed=10950923 ; 

RA Berry A., Scott H.S., Kudoh J. , Talior I., Korostishevsky M. , 

RA Wattenhofer M. r Guipponi M. , Barras C, Rossier C, Shibuya K., 

RA Wang J. , Kawasaki K., Asakawa S., Minoshima S., Shimizu N., 

RA Antonarakis S.E., Bonne-Tamir B. ; 

RT "Refined localization of autosomal recessive nonsyndromic deafness 

RT DFNB10 locus using 34 novel microsatellite markers, genomic 

RT structure, and exclusion of six known genes in the region."; 

RL Genomics 68:22-29(2000). 

RN [4] 

RP SEQUENCE FROM N.A. (ISOFORM 1). 

RX MEDLINE=21192304; PubMed=1127 9031 ; 

RA Porsch-Oezcueruemez M. , Langmann T., Heimerl S., Borsukova H. , 

RA Kaminski W.E., Drobnik W., Honer C. f Schumacher C. f Schmitz G. ; 

RT "The zinc finger protein 202 (ZNF202) is a transcriptional repressor 

RT of ATP binding cassette transporter Al (ABCA1) and ABCG1 gene 

RT expression and a modulator of cellular lipid efflux."; 

RL J. Biol. Chem. 276:12427-12433(2001). 

RN [5] 

RP SEQUENCE FROM N.A. (ISOFORMS 2; 3; 4; 5; 6 AND 7). 

RX MEDLINE=21092576; PubMed=11162488 ; 

RA Lorkowski S., Rust S., Engel T., Jung E., Tegelkamp K., Galinski E.A., 

RA Assmann G. , Cullen P.; 

RT "Genomic sequence and structure of the human ABCG1 (ABC8) gene."; 

RL Biochem. Biophys . Res. Commun. 280:121-131(2001). 

RN [6] 

RP SEQUENCE OF 33-678 FROM N.A. 

RC TISSUE=Fetal brain; 

RX MEDLINE-97186700; PubMed=9034316; 

RA Croop J.M., Tiller G.E., Fletcher J. A., Lux M.L., Raab E. , 

RA Goldenson D., Arciniegas S., Son D., Wu R. ; 

RT "Isolation and characterization of a mammalian homolog of the 

RT Drosophila white gene."; 

RL Gene 185:77-85(1997). 

RN [7] 

RP INDUCTION, AND PROBABLE FUNCTION. 

RX MEDLINE=20261604; PubMed=l 07 99558 ; 

RA Venkateswaran A., Repa J. J., Lobaccaro J.-M.A., Bronson A., 

RA Mangelsdorf D.J., Edwards P.A.; 

RT "Human white/murine ABC8 mRNA levels are highly induced in 

RT lipid-loaded macrophages. A transcriptional role for specific 

RT oxysterols. "; 

RL J. Biol. Chem. 275:14700-14707(2000). 

RN [8] 



RP INDUCTION, AND PROBABLE FUNCTION . 

RX MEDLINE=20105556; PubMed-1 0639163 ; 

RA Klucken J., Buechler C, Orso E. , Kaminski W.E., 

RA Porsch-Oezcueruemez M. , Liebisch G. , Kapinsky M. , Diederich W., 

RA Drobnik W. , Dean M. , Allikmets R., Schmitz G.; 

RT "ABCGl (ABC8), the human homolog of the Drosophila white gene, is a 

RT regulator of macrophage cholesterol and phospholipid transport."; 

RL Proc. Natl. Acad. Sci. U.S.A. 97:817-822(2000). 

RN [9] 

RP REVIEW. 

RX MEDLINE=21474438; PubMed-11590207 ; 

RA Schmitz G., Langmann T., Heimerl S-; 

RT "Role of ABCGl and other ABCG family members in lipid metabolism."; 

RL J. Lipid Res. 42:1513-1520(2 001). 

CC -!- FUNCTION: Transporter involved in macrophage lipid homeostasis. Is 
CC an active component of the macrophage lipid export complex. Could 

CC also be involved in intracellular lipid transport processes. The 

CC role in cellular lipid homeostasis may not be limited to 

CC macrophages. 

CC -!- SUBUNIT: May form heterodimers with several heterologous partners 
CC of the ABCG subfamily. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. Predominantly 
CC localized in the intracellular compartments mainly associated with 

CC the endoplasmic reticulum (ER) and Golgi membranes. 

CC -!- ALTERNATIVE PRODUCTS: 

CC Event=Alternative splicing; Named isoforms=7; 

CC Comment=Additional isoforms seem to exist; 

CC Name=l; 

CC IsoId=P4584 4-l; Sequence=Displayed; 

CC Name =2 ; Synonyms=J; 

CC IsoId=P45844-2; Sequence=VSP_000047, VSP_000051; 

CC Name =3 ; Synonyms =ABDE; 

CC IsoId=P45844-3; Sequence=VSP_000048 , VSP_000051; 

CC Name =4 ; Synonyms=G; 

CC IsoId-P45844-4; Sequence=VSP_000051 ; 

CC Name-5; Synonyms=F; 

CC IsoId=P45844-5; Sequence=VSP_000049, VSP_000051; 

CC Name- 6 ; Synonyms=HI; 

CC IsoId=P45844-6; Sequence=VSP_000046, VSP_000051; 

CC Name=7; Synonyms=C; 

CC IsoId=P45844-7; Sequence=VSP_000050, VSP_000051; 

CC -!- TISSUE SPECIFICITY: EXPRESSED IN SEVERAL TISSUES. 

CC -!- INDUCTION: Strongly induced in monocyte-derived macrophages during 
CC cholesterol influx. Conversely, mRNA and protein expression are 

CC suppressed by lipid efflux. Induction is mediated by the liver X 

CC receptor/retinoide X receptor (LXR/RXR) pathway. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib.ch). 

CC 
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CAA62631. 1 
BAA95530 
BAB13728 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
CAC00730 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28836 
AAK28842 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28833 
AAK28838 
AAK28841 
AAK28835 
AAK28835 
AAK28835 
AAK28835 
AAK28835 
AAK28835 



.1 
.2 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
.1 
-1 
. 1 
. 1 
.1 
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.1 
.1 
.1, 
.1, 
.1 
.1 
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.1 
. 1 
.1 
.1 
. 1 
.1 
. 1 
. 1 
.1 
. 1 
.1 
.1 
.1 
. 1 
.1 
.1 
.1 
. 1 
.1 
.1 
. 1 
.1 
. 1 
.1 
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ALT_INIT. 
ALT_INIT 
ALT_INIT 
ALT_INIT 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED . 
JOINED. 
JOINED. 
JOINED. 
JOINED. 

JOINED. 
JOINED. 
JOINED . 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 



JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 

ALT_INIT. 

JOINED. 
JOINED. 
JOINED. 
JOINED. 
JOINED. 



DR EMBL; AF323649; AAK28835.1; JOINED. 



Query Match 17.9%; Score 596.5; DB 1; Length 678; 

Best Local Similarity 26.5%; Pred. No. 4.6e-35; 

Matches 165; Conservative 142; Mismatches 265; Indels 51; Gaps 14 

Qy 44 SYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGKTTLLDAMSGRL 103 

I I I I III I : : : : I I : I I I : : : I : I I I : I I : I I : : : : I 

Db 83 SYSVPE— GPWW RKKGYKTLLKGISGKFNSGELVAIMGPSGAGKSTLMNILAGY- 134 

Qy 104 GRAGTFLGEVYWGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGS 163 

I I I : I I I : I : : I I I I I M : I : : I I : : : I 

Db 135 -RETGMKGAVLINGLPRDLRCFRKVSCYIMQDDMLLPHLTVQEAMMVSAHLKLQEKDEGR 193 

Qy 164 FQKKVEAVKAELSLSHVADRLIGNYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGL 223 

: : I : : : II I I I : I I : I : I : : I I : I : : I I I I I I I I : I I 

Db 194 -REMVKEILTALGLLSCA NTRTGSLSGGQRKRLAIALELVNNPPVMFFDEPTSGL 247 

Qy 224 DCMTANQI VVLLVELARRNRI VVLTIHQPRSELFQLFDKI AI LS FGELI FCGTPAEMLDF 283 

I : I : I I : I I :- I : : I I I I I : : I I : I I I : : : I I I : : : I : : : 

Db 248 DSASCFQWSLMKGLAQGGRSIICTIHQPSAKLFELFDQLYVLSQGQCVYRGKVCNLVPY 307 

Qy 284 FNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIESAYKKSA 1 334 

I I I I : I I I I I : : I : : : : I I : I : I : : 
Db 308 LRDLGLNCPTYHNPADFvMEVASGEYGDQNSRLVT^AVREGMCDSDHKRDLGGDAEVNPFL 367 

Qy 335 CHKTLKNI ERMKHLKTLPMVPFKTKDS PGV FSKLGVLLRRVTRNLVRNKL 384 

I : : : : : I I I I III : : : : | : | : : : | : : 

Db 368 WHRPSEEVKQTKRLKGL RKDSSSMEGCHSFSASCLTQFCILFKRTFLSIMRDSV 421 

Qy 385 AVI T RL LQN L I MGL FLL F FVTjRVRS NVL KGAI Q D RVGL L YQ FVGAT P YT GMLNAVN L F P V 444 

I : : : : I I : I : : I : I I : : : : : I | | : 

Db 422 LTHLRITSHIGIGLLIGLLYLGIGNEAKK — VLSNSGFLFFSMLFLMFAALMPTVLTFPL 479 

Qy 445 LRAVSDQESQDGLYQKWQMMLAYALHVLPFSWATMIFSSVCYWTLGLHPEVARFGYFSA 504 

I : I : I II : : I I :: : : I : I I : II I : I 

Db 480 EMGVFLREHLN YW YS LKAY YLAKTMADVP FQ IMFPVAYC S I VYWMT S Q P S DAVRFVL FAA 539 

Qy 505 ALIAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIPFKIISY 564 

I : : I I I : I I : I : : I I : I I I : : I : : I I 

Db 540 LGTMT S LVAQS LGL - LI GAAST S LQVATFVGPVTAI PVLLFS GFFVS FDT I PT YLQWMS Y 598 

Qy 565 FTFQKYCSEILWNEFYGLNFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPGATSRFTMNF 624 

: : : I I : : : : III: : : : III : | : : : : : I 

Db 599 ISYVRYGFEGVILS-IYGLD REDLHCDIDETCHF-QKSEAILRELDVENAKLYLDF 652 

Qy 625 LILYSFIPALVILGIW — FKIR 645 

: : I I : I : : I : I I I 
Db 653 IVLGI FFI SLRLIAYFVLRYKIR 675 



RESULT 14 
WHIT_CERCA 

ID WHIT_CERCA STANDARD; PRT; 679 AA. 

AC Q17320; 

DT 01-NOV-1997 (Rel. 35, Created) 

DT 01-NOV-1997 (Rel. 35, Last sequence update) 



DT 16-OCT-2001 (Rel. 40, Last annotation update) 

DE White protein. 

GN W. 

OS Ceratitis capita.ta (Mediterranean fruit fly) . 

OC Eukaryota; Metazoa; Arthropoda; Hexapoda; Insecta; Pterygota; 

OC Neoptera; Endopterygota ; Diptera; Brachycera; Muscomorpha; 

OC Tephritoidea; Tephritidae; Ceratitis. 

OX NCBI_TaxID=72 13 ; 

RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=96123276; PubMed=8533095 ; 

RA Zwiebel L.J., Saccone G., Zacharopoulou A., Besansky N.J., 

RA Fa via G., Collins F.H. f Louis C, Kafatos F.C.; 

RT "The white gene of Ceratitis capitata: a phenotypic marker for 

RT germline transformation."; 

RL Science 27 0:2 005-2007(1995). 

CC -!- FUNCTION: May be part of a membrane-spanning permease system 
CC necessary for the transport of pigment precursors into pigment 

CC cells responsible for eye color. 

CC -!- SUBCELLULAR LOCATION: Integral membrane protein. 

CC -!- SIMILARITY: Belongs to the ABC transporter family. MDR subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

cc 

DR EMBL; X89933; CAA61998.1; -. 

DR InterPro; IPR003593; AAA_ATPase. 

DR InterPro; IPR003439; ABC_transporter . 

DR InterPro; IPR005284; Pigment_permease . 

DR Pfam; PF00005; ABC_tran; 1. 

DR ProDom; PD000006; ABC_transporter ; 1. 

DR SMART; SM00382; AAA; 1. 

DR TIGRFAMs; TIGR00955; 3a01204; 1. 

DR PROSITE; PS00211; ABC_T RAN S PORT ER_1 ; 1. 

DR PROSITE; PS50893; AB C_T RAN S PORT ER_2 ; 1. 

KW Pigment; ATP-binding; Transmembrane; Transport. 

FT NP_BIND 121 128 ATP (BY SIMILARITY) . 

FT TRANSMEM 427 445 POTENTIAL. 

FT TRANSMEM 457 477 POTENTIAL. 

FT TRANSMEM 507 525 POTENTIAL. 

FT TRANSMEM 534 555 POTENTIAL. 

FT TRANSMEM 568 586 POTENTIAL. 

FT TRANSMEM 651 670 POTENTIAL. 

FT CARBOHYD 628 628 N-LINKED (GLCNAC. . .) (POTENTIAL). 

FT CARBOHYD 643 643 N-LINKED (GLCNAC. . .) (POTENTIAL) . 

SQ SEQUENCE 679 AA; 75145 MW; 3F9CBC78A835C4CC CRC64; 

Query Match 17.8%; Score 591; DB 1; Length 679; 

Best Local Similarity 28.4%; Pred. No. l.le-34; 

Matches 176; Conservative 125; Mismatches 231; Indels 88; Gaps 18; 
Qy 66 RQI LKDVS LYVES GQIMCI LGS S GS GKTTLLDAMS GRLGRAGT FLGEV — YVNGRALRRE 123 



Db 101 KHLLKNDSGVAYPGELLAVMGSSGAGKTTLLNASAFRSSKGVQISPSTIRMLNGHPVDAK 160 

Qy 124 QFQDCFSYVLQSDTLLSSLTVRETLHYTALLAIRRGNPGSFQK — KVEAVMAEL S L S HVA 181 

: I : I I I I : I I I II I : I : : : I : II : I : I : : I I I 

Db 161 EMQARCAYVQQDDLFIGSLTAREHLIFQAMVRMPR-HMTQKQKVQRVDQVIQDLSLGKCQ 219 

Qy 182 DRLIG-NYSLGGISTGERRRVSIAAQLLQDPKVMLFDEPTTGLDCMTANQIWLLVELAR 240 

: I I I : I : I I I I : I : : I : : I I I : : : I I I I : I I I I : : I : I : I : : 

Db 220 NTLIGVPGRVKGLSGGERKRLAFASEALTDPPLLICDEPTSGLDSFMAHSWQVLKKLSQ 279 

Qy 241 RNRIWLTIHQPRSELFQLFDKIAILSFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDF 300 

: : I : I I I I I I I I I I : I I I I I : : : I : I I I I I : I I I : | | | : |M 
Db 280 KGKTVILTIHQPSSELFELFDKILLMAEGRVAFLGTPGEAVDFFSYIGATCPTNYTPADF 339 

Qy 301 YMDLTSVDTQSKEREIETSKRVQMIESAYKKSAICHKTLKNIERMKHLKTLPMVPFKTKD 360 

I : : : I I I : I : I I I : : : : | : | : | : | 

Db 34 0 YVQVLAV VP GREVE S RDRVAKI CDN FAVGKVS REMEQN FQ KLVKSNGFGKED 391 

Qy 361 SPGVFSKLGVLLRRWRNLVRNKIAVITRLLQNLIMGLFL-LFFVLRVRSN^K 413 

I : : I I : : : : II I I I I : : : : I I : : : I 
Db 392 ENEYTYKASWFMQFRAVLWRSWLSVLKEPLLVKVRLLQTTMVAVLIGLIFLGQQLTQV — 449 

Qy 414 GAIQDRVGLLYQFVGATPYTGMLNAWLFPVLRAVSDQESQDGLYQKWQMMLAYALHVLP 473 

: : I : : I : : : : I I : I : : I I : I : I I 

Db 450 -GVMNINGAIFLFLTNMTFQNSFATITVFTTELPVFMRETRSRLYRCDTYFLGKTIAELP 508 

Qy 474 FSWATMIFSSVCYWTLGLHPEVARFGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSV 533 

: I : I : : : I : I | | | | | | | : : | | : | 

Db 509 LFLWP FL FTAI AY P L I GLRP GVDH F FT ALALVT LV — ANVSTSF 551 

Qy 534 VALLS IAGVLVGSGFLRNIQEMPI PFKI ISYFTFQKYCSEILW 577 

I : I I hill : I : I I : I I : : : I : I I : : 

Db 552 GYLISCACSSTSMALSVGPPVIIPFLLFGGFFLNSGSVPVYFKWLSYLSWFRYANEGLLI 611 

Qy 578 NEFYGL NFTCGSSNVSVTTNPMCAFTQGIQFIEKTCP — GATSRFTMNFL ILYS 629 

I : : : I I I I I I I I I : I I : : 

Db 612 NQWADVKPGEITCTLSNT TCPSSGEVILETLNFSASDLPFD 652 

Qy 630 FIP-ALVILGIWFKIRDHL 648 

II I I : I : I I : I : : 

Db 653 FIGLALLIVG FRI SAY I 669 



RESULT 15 
ABG4_HUMAN 

ID ABG4_HUMAN STANDARD; PRT ; 64 6 AA. 

AC Q9H172; 

DT 28-FEB-2003 (Rel. 41, Created) 

DT 28-FEB-2003 (Rel. 41, Last sequence update) 

DT 10-OCT-2003 (Rel. 42, Last annotation update) 

DE ATP-binding cassette, sub-family G, member 4. 

GN ABCG4 OR WHITE2 . 

OS Homo sapiens (Human) . 

OC Eukaryota; Metazoa; Chordata; Craniata; Vertebrata; Euteleostomi ; 
OC Mammalia; Eutheria; Primates; Catarrhini; Hominidae; Homo. 
OX NCBI TaxID=9606; 



RN [1] 

RP SEQUENCE FROM N.A. 

RX MEDLINE=21518231; PubMed=11606068 ; 

RA Engel T., Lorkowski S., Lueken A. f Rust S., Schlueter B., Berger G., 

RA Cullen P., Assmann G. ; 

RT "The human ABCG4 gene is regulated by oxysterols and retinoids in 

RT monocyte-derived macrophages. "; 

RL Biochem. Biophys. Res. Commun. 288:483-488(2001). 

RN [2] 

RP SEQUENCE FROM N.A. 

RC TISSUE=Brain; 

RX MEDLINE=22388257; PubMed=12477932 ; 

RA Strausberg R.L., Feingold E. A. , Grouse L.H., Derge J.G., 

RA Klausner R.D., Collins F.S., Wagner L., Shenmen CM., Schuler G.D., 

RA Altschul S.F., Zeeberg B., Buetow K.H., Schaefer C.F., Bhat N.K., 

RA Hopkins R.F., Jordan H., Moore T . , Max S.I., Wang J., Hsieh F. , 

RA Diatchenko L., Marusina K. , Farmer A. A., Rubin G.M., Hong L., 

RA Stapleton M. , Soares M.B., Bonaldo M.F., Casavant T.L., Scheetz T.E., 

RA Brownstein M.J., Usdin T.B., Toshiyuki S. f Carninci P., Prange C, 

RA Raha S.S., Loquellano N.A., Peters G.J., Abramson R.D., Mullahy S.J., 

RA Bosak S.A. , McEwan P. J., McKernan K.J., Malek J. A., Gunaratne P.H., 

RA Richards S., Worley K.C., Hale S., Garcia A.M., Gay L.J., Hulyk S.W., 

RA Villalon D.K., Muzny D.M., Sodergren E.J., Lu X., Gibbs R . A . , 

RA Fahey J . , Helton E., Ketteman M. , Madan A., Rodrigues S., Sanchez A., 

RA Whiting M. , Madan A., Young A.C., Shevchenko Y., Bouffard G.G., 

RA Blakesley R.W., Touchman J.W. , Green E.D., Dickson M.C., 

RA Rodriguez A.C., Grimwood J., Schmutz J. , Myers R.M., 

RA Butterfield Y.S.N., Krzywinski M.I., Skalska U., Smailus D.E., 

RA Schnerch A., Schein J.E., Jones S.J.M., Marra M.A. ; 

RT "Generation and initial analysis of more than 15,000 full-length 

RT human and mouse cDNA sequences."; 

RL Proc Natl. Acad. Sci. U.S.A. 99:16899-16903(2002). 

RN [3] 

RP SEQUENCE OF 2 0-64 6 FROM N.A. 

RC TISSUE=Dorsal root ganglion; 

RX MEDLINE=22170423; PubMed=12183068 ; 

RA Oldfield S., Lowry C, Ruddick J., Lightman S.; 

RT "ABCG4: a novel human white family ABC-transporter expressed in the 

RT brain and eye."; 

RL Biochim. Biophys. Acta 1591:175-179(2002). 

CC -!- FUNCTION: May be involved in macrophage lipid homeostasis. 
CC SUBCELLULAR LOCATION: Integral membrane protein (Probable). 

CC -!- SIMILARITY: Belongs to the ABC transporter family. ABCG (White) 
CC subfamily. 

CC 

CC This SWISS-PROT entry is copyright. It is produced through a collaboration 

CC between the Swiss Institute of Bioinf ormatics and the EMBL outstation - 

CC the European Bioinf ormatics Institute. There are no restrictions on its 

CC use by non-profit institutions as long as its content is in no way 

CC modified and this statement is not removed. Usage by and for commercial 

CC entities requires a license agreement (See http://www.isb-sib.ch/announce/ 

CC or send an email to license@isb-sib . ch) . 

cc 

DR EMBL; AJ308237; CAC87131.1; -. 

DR EMBL; BC041091; AAH41091.1; -. 

DR EMBL; AJ300465; CAC17140.1; -. 

DR PIR; JC7777; JC7777. 



DR Genew; HGNC: 13884; ABCG4 . 



DR 


MIM; 6077 


84; 






DR 


InterPro; 


IPR003593; AAA 


ATPase. 


DR 


InterPro; 


IPR003439; ABC 


transporter. 


DR 


Pfam; PF00005; ABC tran; 


1. 


DR 


ProDom; PD000006; 


ABC transporter; 1. 


DR 


SMART; SM00382; AAA; 1. 




DR 


PROSITE; 


PS00211; 


ABC_TRANS PORTER 1; 1. 


DR 


PROSITE; 


PS50893; 


ABC TRANSPORTER 2; 1. 


KW 


ATP-bindi 


ng; Glycoprotein 


; Transmembrane; Transport. 


FT 


DOMAIN 


1 


393 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


394 


414 


1 (POTENTIAL) . 


FT 


DOMAIN 


415 


425 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


426 


446 


2 (POTENTIAL) . 


FT 


DOMAIN 


447 


472 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


473 


493 


3 (POTENTIAL) . 


FT 


DOMAIN 


494 


503 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


504 


524 


4 (POTENTIAL) . 


FT 


DOMAIN 


525 


532 


CYTOPLASMIC (POTENTIAL) . 


FT 


TRANSMEM 


533 


553 


5 (POTENTIAL) . 


FT 


DOMAIN 


554 


617 


EXTRACELLULAR (POTENTIAL) . 


FT 


TRANSMEM 


618 


638 


6 (POTENTIAL) . 


FT 


DOMAIN 


639 


646 


CYTOPLASMIC (POTENTIAL) . 


FT 


NP_BIND 


102 


109 


ATP (POTENTIAL) . 


FT 


CARBOHYD 


422 


422 


N-LINKED ( GLCNAC . . .) (POTENTIAL) 


SQ 


SEQUENCE 


64 6 AA; 


71895 


MW; 9CCEC6E150772611 CRC64; 



Query Match 17.4%; Score 578.5; DB 1; Length 646; 

Best Local Similarity 27.1%; Pred. No. 8.4e-34; 

Matches 171; Conservative 126; Mismatches 274; Indels 59; Gaps 14 

QY 33 PEPHSLGILHASYSVSHRVRPWWDITSCRQQWTRQILKDVSLYVESGQIMCILGSSGSGK 92 

I: I I I I I I |:: : :|| :| ::: |:| ||:|| 

Db 54 PKRSAVDIEFVELSYSVREGPCW RKRGYKTLLKCLS GKFCRRELI GIMGPS GAGK 108 

QY 93 TTLLDAMSGRLGRAGTFLGEVYVNGRALRREQFQDCFSYVLQSDTLLSSLTVRETLHYTA 152 

:| :: ::| I |:: MM |: MM I M Mil: : I 

Db 109 STFMNILAGY — RESGMKGQILVNGRPRELRTFRKMSCYIMQDDMLLPHLTVLEAMMVSA 166 

QY 153 LLAI RRGNPGS FQKKVEAV MAELSLSHVADRLI GNYSLGGI STGERRRVS IAAQ 206 

I : : I I I : M II I :| MMMMI : 

Db 167 NLKLSEKQ EVKKELVTEILTALGLMSCSHTRTAL LSGGQRKRLAIALE 214 

QY 207 LLQDPKVMLFDEPTTGLDCMTANQIWLLV^LARRNRIVVLTIHQPRSELFQLFDKIAIL 266 

I: M II MMMMI : MIM IM I : : M I I I : M M M II : I I 
Db 215 LVNNPPVMFFDEPTSGLDSASCFQWSLMKSLAQGGRTIICTIHQPSAKLFEMFDKLYIL 274 

Qy 267 SFGELIFCGTPAEMLDFFNDCGYPCPEHSNPFDFYMDLTSVDTQSKEREIETSKRVQMIE 326 



Db 



275 



... i ii • I i ii ... i 

SQGQCIFKGWTNLIPYLKGLGLHCPTYHNPADFIIEVASG- 



EYGDLNPMLF 



325 



Qy 



327 



SAYKKSAI CHKTLKNI ERMKHLKTLPMVPFKTKDS P GVFSKLGVLLRRVTRN 



378 



Db 



326 




384 



Qy 



379 



LVRN KLAVI T RLLQN L I MG L FL L F FVL RVRS NVL KGAI QD RVGL L YQ FVGAT P YT GMLN A 



438 



Db 385 ILRDTVLTHLRFMSHWIGVLIGLLYLHIGDDASK — VFNNTGCLFFSMLFLMFAALMPT 442 

Qy 439 WLFPVLRAVSDQESQDGLYQKWQMMIAYALHVLPFSWATMIFSSVCYWTLGLHPEV^ 498 

I II: II : I : I II : : II I I : : : I : II I I : I 

Db 443 VLTFPLEMAVFMREHLNYWYSLKAYY 502 

QY 499 FGYFSAALLAPHLIGEFLTLVLLGIVQNPNIVNSWALLSIAGVLVGSGFLRNIQEMPIP 558 

I M I I I : : I I I : I I I : I : : I I : I I I : : : | 
Db 503 FL L FS ALAT ATALVAQ S L GL - L I GAAS N S LQVAT FVG P VT AI P VL LFS G F FVS FKT I P T Y 561 

QY 559 FKIISYFTFQKYCSEILWNEFYGL NFTCGSSNVSVTTNPMCAFTQGIQFIEKTCPG 615 

: II :: :| I ::: ||: : II I I : I I : 

Db 562 LQWS S YLS YVRYGFEGVI LT- 1 YGMERGDLTC LEERCPFREP-QSILRALDV 611 

Qy 616 AT S RFTMNFLI LYS FI PALVT LGI VVFKI R 645 

: : I : I I : I I I I : I : I : I 
Db 612 EDAKLYMDFLVLGI FFLALRLLAYLVLRYR 641 

Search completed: February 27, 2004, 07:12:39 
Job time : 12.0797 sees 



